Read Microsoft Word - 14Design and Implementation of Adaptive Assessment System Using Item Response Theory Model.doc text version

Design & Implementation of Adaptive Assessment System Using Item Response Theory Model

Design and Implementation of Adaptive Assessment System Using Item Response Theory Model

Soumya Sen Gupta, Vikram Vijh , Kartikye Vats Abstract- Identifying strengths and weaknesses in student's knowledge in different domains is an important requirement while evaluating students. Computerized tests are among the most viable ways to judge student understanding of a particular subject. These could either be non adaptive, wherein, a set of predefined questions are delivered to the examinee or adaptive, wherein the questions delivered to examinee corresponds to the ability level of that examinee. The ability level of the examinee is judged by the computer depending upon the responses received from the examinee. This paper presents design and implementation of Adaptive Assessment system in the National Online Examination System. 1. Introduction National online examination system (NOES) is being developed for assessment of Graduate engineers in IT as part of a DIT funded project. It has user centric provisions and administrative functions. The former include registration of candidates, selection of center, selecting examination time and date, and payment of fee (either through Demand Draft or through Credit Card). The latter include admit card generation, verification of fee receipt, question pool management, Question paper matrix preparation, creating nodes for different subjects for adding questions, verification of questions for their technical correctness, tagging of questions under different subjects, etc. At present, the system supports only Multiple choice questions(MCQ) as it is easier to evaluate answers automatically. MCQ also offers flexibility in type of outcome assessed: knowledge goals, application goals, analysis goals, etc. The first version of NOES had a static question paper generation system. It was based on question paper matrix given by a question paper setter. Question paper was generated for each candidate before start of examination and questions were administered in sequential manner irrespective of the response of the candidate. In this, the examinee could skip a question, and / or review the answer later. He normally starts by answering easy questions and skipping the difficult ones and later proceeds to the review phase to answer the more difficult ones. Thus, this type of assessment system returns an average score of the examinee. It doesn't provide his ability in a sub-domain of the total domain considered. It is very useful in case of mass recruitment, placement, specific specialization need or selection-rejection requirement. However, in situations where we wish to assess the examinees for various skill-sets based on a single examination, we would like to determine their ability in each sub-domain. This is needed to ensure that the combined set of selected candidates have a good spectrum of skill sets. Keeping this in view, it was decided to introduce Adaptive Assessment system (AAS) in NOES. In AAS, questions are provided to the examinee one at a time starting at a pre-defined difficulty level and thereafter increasing or decreasing the difficulty level of questions in accordance with the response pattern of examinee. Unlike the static case which has fixed item contents, AAS selects only those items from a predefined item bank that corresponds

Proceedings of ASCNT ­ 2010, CDAC, Noida, India, pp. 115 ­ 124


Soumya Sen Gupta, Vikram Vijh , Kartikye Vats

to "the most informative" for the examinee. Thus the difficulty of the examination is tailored to suit the ability of the examinee and every examinee gets a unique test. The paper presents the design and implementation of AAS developed. Section 2 gives details of Adaptive Assessment Model. Item Response theory (IRT) used in the design is described in section 3. Section 4 describes the Item Response model. Section 5 describes the process used for Question bank characterization and section 6 the difficulty level characterization of question pool. Section 7 describes the implementation aspects and section 8 gives the future modifications currently envisaged. 2. Adaptive Assessment Model In AAS, the examinee is first given a question of intermediate difficulty to initially assess his/her ability. If the examinee answers correctly, the ability of the examinee is made to increase and a more difficult item that corresponds to the current ability is given to him. Otherwise the ability decreases and the examinee gets an easier question. At each response, the ability of the examinee is re-estimated and the precision with which the ability is estimated is calculated. This results in quick convergence to the actual ability level of the examinee. The test is made to stop when the ability of the examinee has been calculated with desired precision. (It might also stop due to other exit criteria - like the maximum number of questions have been delivered or the given time limit has been reached). The level of ability speaks for the score of the examinee. The theoretical framework used here for implementing AAS is the Item Response Theory (IRT). The IRT algorithm aims to provide information about the functional relation between the estimate of the learner's proficiency and the likelihood that the learner will give the correct answer to a specific question. In the case of a typical question, this likelihood will be small for examinees of low ability and large for examinees of high ability. AAS can results in shorter tests provided the examinee is consistent in performance. It is also useful in instant generation of results, better item exposure control, less administrative procedure, etc. There is no wastage of time in answering questions that are either too easy or too difficult for the examinee as the test will automatically update itself to his ability. Disadvantages include support for only dichotomous and polytomous items and constraints of no-review of previously answered questions, no-skipping of questions and examinee discomfort due to unfamiliarity with the method. 3. Item Response Theory Item Response Theory provides the theoretical framework for implementation of Computerized Adaptive Testing. The IRT algorithm aims to provide information about the functional relation between the estimate of the learner's proficiency in a concept and the likelihood that the learner will give the correct answer to a given question. The amount of knowledge, learning ability, proficiency in a subject of a person, etc., cannot be directly measured like height or weight. These are generally referred to as the latent traits or "ability" in Item Response Theory. The aim of IRT is to estimate this ability. IRT rests on the postulate that an examinee has a definite probability of giving a correct answer to the given question, and this probability will be high for high ability examinees and low for low-ability ones. Let the ability be `' and the probability of a correct response by an examinee with ability `' be `P()' . If we map `' against `P()' in an X-Y plane we get a smooth `S'-shaped


Design & Implementation of Adaptive Assessment System Using Item Response Theory Model

curve called the Item Characteristic Curve (ICC). Figure 1 provides a sample item characteristic curve.

Probability of Correct Response

Probability of Correct Response


Fig.1: Item Characteristic Curve (Source [2]) The item characteristic curve is the basic building block of item response theory. There are two technical properties of an item characteristic curve that are used to describe it. The first is the difficulty of the item. Under item response theory, the difficulty of an item describes where the item functions along the ability scale. Figure 2 shows an ICC curve with various difficulty levels.

Probability of Correct Response Ability Fig.2: Item Characteristic Curve for Questions with various difficulty levels(Source [2]) The curve shifts to the right as difficulty of item increases. Though the value of difficulty may range from - to + the value mainly lies between -3 and +3 (In line with Standard Gaussian Distribution). The second technical property is discrimination, which describes how well an item can differentiate between examinees having abilities below the item location and those having abilities above the item location. This property essentially reflects the steepness of the item characteristic curve in its middle section. The steeper the curve, the better the discrimination. The theoretical range of the values of this parameter is - < = a < = + , but the usual range seen in practice is -2.80 to +2.80 ([1]). Figure 3 and Figure 4 indicates two ICC curves with high and low discrimination.


Soumya Sen Gupta, Vikram Vijh , Kartikye Vats

Probability of Correct Response


Fig.3: Questions with High Discrimination(Source [2])

Probability of Correct Response

Ability Fig.4: Questions with Low High Discrimination(Source [2]) 4. The Item Response Models The item characteristic curve forms the basis of different item response theory models. These models relate the probability of a correct response to the ability. Given the ability, the model will give the probability of a correct response. The simplest of the models (Rasch model) takes into account only the difficulty parameter in calculating this probability. The discrimination parameter is kept fixed at 1 and only the item-difficulty parameter provides an idea about the characteristic of the item. In the two-parameter model both item difficulty and item-discrimination parameters are made to vary. Thus the probability of correctly answering a question becomes a function of item-difficulty, itemdiscrimination and ability. Birnbaum (1968) modified the two-parameter logistic model to include a third parameter that represents the contribution of guessing to the probability of correct response. The guessing parameter takes into account the probability of a guess being correct. The equation (1) gives the three-parameter model.

Where b is the difficulty parameter a is the discrimination parameter c is the guessing parameter is the ability 118

Design & Implementation of Adaptive Assessment System Using Item Response Theory Model

The primary purpose for administering a test to an examinee is to locate that person on the ability scale [1]. Without loss of generality, we may assume, that before calculating the examinee ability all the technical parameters of the administered questions are known and that the scoring is either 0 in case of wrong answer or 1 in case of a correct answer. One drawback of IRT is its inability to calculate the ability of an examinee if either all his answers are correct or all are wrong. In the latter case it gives a negative infinity as ability and a positive infinity for the former. So a finite ability can only be calculated when there is at least one correct and one incorrect response in the response list. The equation (2) calculates the ability of an examinee

Where s is the estimated ability after iteration s ai is the discrimination parameter for item i ui is the response made by examinee to item i ui = 1 for correct response ui = 0 for incorrect response Pi (s) is the probability of correct response to item i, under the given item characteristic curve model, at ability level within iteration s. Qi(s) is the probability of incorrect response to item i, under the given item characteristic curve model, at ability level within iteration s. At the start, the value of s on the right hand side of the equals sign is set to an initial value. The second term on the extreme right hand side of the equation is an adjustment. The adjustment is calculated taking into account the probability of correct responses to each of the answered items with this initial ability and the adjustment is added to the initial ability to get the new ability. This is an iterative process which goes on until the adjustment becomes insignificantly low. But the final ability is only an estimate which may be close to the actual ability of the examinee. In order to find out how close the estimate is to the actual ability of the examinee, the standard error is calculated. The standard error is a measure of the variability of the values of estimated abilities around the examinee's unknown parameter value . The standard error is calculated as follows

Another basic principle of item response theory is that the examinee's ability is invariant with respect to the items used to determine it. This principle rests upon two conditions: first, all the items measure the same underlying latent trait; second, the values of all the item parameters are in a common metric [1]. This means that if the same examinee is made to give two examinations, one easy and the other one hard, both of the tests will produce the same ability estimate.


Soumya Sen Gupta, Vikram Vijh , Kartikye Vats

5. Estimation of Item Parameters and Item Selection It is mandatory that each question in the question bank has all the technical properties pertaining to the Item Response Theory model used for finding the ability of an examinee. In the present case, the three-parameter model is being used and so each question will have its own difficulty, discrimination and guessing parameter. Parameter estimation is the procedure by which these three parameters for each question is calculated based on the responses to the respective questions provided to the examinee during a mock examination phase. The mock examinations were conducted for more than 500 students belonging to MCA , M Tech and PG programs being offered at CDAC, Noida. In order to calculate the parameters, it is necessary to have a minimum number of responses (from the mock test) to the questions to be used in the main examination. After each examination, parameter estimations of the questions are re-calculated, thus increasing the robustness of the question pool. Once the ability of an examinee is estimated, we have to determine the most optimum question that suites this ability. For this, the question that maximizes the information function at current ability is selected. The information function is the reciprocal of the variance of the ability estimate. It gives an idea about the precision with which a given ability level can be estimated. The graph of information vs ability is shown in figure 5. For each ability the amount of information given by a particular item can be calculated.


Ability Fig.5: Information vs Ability The equation (4) for calculating information ( I ) is given below

Once the ability of an examinee is ascertained the quest lies in finding out a question with appropriate technical parameters which would provide the maximum information about the examinee at that ability. This question is now given to the examinee.


Design & Implementation of Adaptive Assessment System Using Item Response Theory Model

6. Difficulty Level Characterization of Questions The Latent Variable Modeling provides a framework for the analysis of dichotomous and polytomous items. For dichotomous data - the Rasch(LTM) , the Two-Parameter Logistic, and Birnbaum's Three-Parameter models have been implemented, whereas for polytomous data Semejima's Graded Response model is available. (Dimitris Rizopoulos). Parameter estimates are obtained under marginal maximum likelihood using the Gauss-Hermite quadrature rule using LTM. The first step towards parameter estimation is to gather responses to questions from different examinations. The data is then used to prepare a response matrix. A part of sample response matrix is shown below Student Question1 Question2 Question3 0 0 0 1 1 0 1 1 2 3 1 1 1 0 0 1

We call this matrix as Test Set. Assuming the discrimination parameter to be fixed at 1 we use this matrix to find out the difficulty of the questions using the function given below rasch(TestSet, constraint = cbind(length(TestSet) + 1, 1)) The result of this is another matrix shown below Value Dffclt.It1 2.183 Dffclt.It2 -1.76 Dffclt.It3 -0.77 Dscrmn 1

Std. Err 0.0285 0.174 0.0973 NA

From this the difficulty levels of the questions can be easily retrieved. For NOES, the difficulty parameter of the questions in the question bank has been characterized into five distinct difficulty levels based on the raw difficulties as retrieved using the above mentioned process. They are very easy (-3), easy (-1.5), medium (0), hard (1.5) and very hard (3). This characterization has been done in the following manner. Raw Difficulty Value <= -2.25 -2.25 to -0.75 -0.75 to 0 0.75 to 2.25 Greater than 2.25 Discrete Difficulty Value -3 -1.5 0 1.5 3

The IRT model used here is the three-parameter IRT model which characterizes each question in the item bank with three parameters namely difficulty parameter, discrimination parameter and guessing parameter. The performance of an examinee in a test thus becomes a function of the ability of the examinee and the above mentioned parameters.


Soumya Sen Gupta, Vikram Vijh , Kartikye Vats

7. Implementing AAS in NOES National Online Examination System (NOES) tests an examinee on multiple subjects like Aptitude, Programming Language, Data Structures, etc. Therefore, the question paper for the examinee needs to have questions from all these sections. Also, each section must have enough questions of various difficulty levels so as to precisely determine the ability of examinee. For the recruitment exam, there were 12 sections for each examinee. Being adaptive in nature each section had to contain 30 questions from each of 5 difficulty levels where 30 is the upper limit of the number of questions that an examinee can attempt in a section. So a single question paper contained 30 * 5 * 12 or 1800 questions. Major task here was to send such a huge set of questions to hundreds of examinees simultaneously at the start of the exam. In order to reduce the load on the network, the questions were sent to the examinees only section wise. The next issue relates to calculating of the ability estimates of the examinees and providing him the next best question corresponding to his/her ability. Ability estimation of an examinee for a section takes into account all the responses made by the examinee in that section. This involves a huge amount of calculations as ability estimation and question selection has to be done after each response for each examinee. In order to relieve server from the workload of the above calculations it was decided that the calculations involved for an examinee must be done at the client machine which the examinee is using. This is possible as we have implemented the client as a Rich Internet Application capable of working outside the browser environment [10]. At the start of each section the client gets all the question ids corresponding to that section. The client calculates examinee ability and decides the difficulty level of the question to be given next. It then sends to server the id of a question of the calculated difficulty level. On receiving the id the server sends the question corresponding to that id along with its correct option. The examinee sees the question text along with the answer options and makes a response. The client decides the correctness of the response and re-calculates the ability and difficulty of the next question. To make sure that data is not lost if somehow the client machine gets crashed, the client was made to send the following details after each response made by the examinee. a) The current question-id. b) The current section-id. c) The response to that question (1 for correct response and 0 of incorrect response). d) The current ability. e) The question id of the next question to be sent. f) The time taken by the examinee to make the response. The server in turn sends the following a) The question text b) The answers options c) The correct option Now even if the client machine crashes, the server has enough information so as to re-start the examination for the examinee. In such case the server needs to send the


Design & Implementation of Adaptive Assessment System Using Item Response Theory Model


The response array of the examinee which is in the form of 0 and 1 (for incorrect and correct responses) b) The latest ability of the examinee c) The ids of the questions he has answered d) The id of the section he was in e) All the completed section ids Thus the workload on the server gets significantly reduced as all the calculations are being handled by the client itself and the server needs only to store some vital information pertaining to examinees AAS test can be stopped if any of the following criteria are met. The ability of an examinee has been calculated with desired precision (i.e. the standard error in calculating the ability has fallen below a predefined value). The first `n' numbers of questions is consecutively answered correctly or wrongly. Again the value of `n' needs to be predefined. If the first `n' numbers of questions are answered correctly the examinee is given the highest ability for that subject or examination. If answers are all incorrect the examinee is given the lowest ability. The maximum on number of questions for a particular section or the whole of examination has been administered to the examinee. This case may arise when the computer is unable to precisely predict the ability of examinee. The maximum time limit for a particular section in an examination or the total time limit of the examination has been reached. The item bank has got exhausted which may be the case when item bank is quite small. 8. Conclusions In the current version of AAS, examinees were not allowed to review the questions already answered by him as he can trick the system into administering easier questions by deliberately giving wrong answers to certain questions, and later correct these. If we still wish to give the option to review the answered questions, there is a need to find out its effect on the ability of the examinee and determine if it is feasible to provide this in the future exams. It may so happen that same set of questions are being used in consecutive tests resulting in the appearance of same questions in the examinations. Also the same question may be administered to examinees of close or similar abilities. To avoid such circumstances proper item exposure control mechanisms must be adopted. Acknowledgment Authors would like to thank Sh. GV Ragunathan Sr. Director DIT , Sh. Anil Pipal Addl Director, DIT for their constant motivation and support throughout the project life cycle. Authors also express their sincere thanks to Sri D K Jain , Sh R K Singh , Sh V K Sharma, Dr P R Gupta and Dr George Varkey for their constant support in executing the project . Authors express their acknowledgement to their team members P Govind Raj , Pankaj Nirwan , Kanti Singh , Neha Sharma, Pradeep Kumar , Parag for their efforts to make this software a reality


Soumya Sen Gupta, Vikram Vijh , Kartikye Vats

References |1| Frank B. Baker, "THE BASICS OF ITEM RESPONSE THEORY" 2nd edition, Publisher: ERIC Clearinghouse on Assessment and Evaluation [], 2001. |2| Ivailo Partchev, "A visual guide to item response theory", [], Feb 2004. |3| Randall E. Schumacker, "ITEM RESPONSE THEORY" Michel Laurier, "What can we do with Computerized Adaptive Testing .. And what we cannot do " |4| Wim J. van der Linden, "Computerized Adaptive Testing: Theory And Practice" |5| Mansoor Al-A'ali, "Implementation of an Improved Adaptive Testing Theory" [], 2007. |6| John Michael Linacre ,"Computerized Adaptive Testing: Amethodology Whose Time Has Come" |7| Dimitris Rizopoulos "Journal of Statistical Software" Vol 17, no. 5, November 2006. |8| Dimitris Rizopoulos "Item Response Theory Using the ltm Package" About Authors Mr. Soumya Sen Gupta did his B.Tech in Computer Science from West Bengal University of Technology and is currently working as Contract Engineer (I) on National Online Examination Project at CDAC, Noida from Jan. His area of interest include Application security and Public Key Crypto Systems.

Mr. Vikram Vijh completed his B.Tech (Computer Engg.) from Faculty of Engineering , Jamia Millia University. He is currently working as a Flex developer in National Online Examination System. His area of interest includes Rich Internet Applications and Adaptive Systems.

Mr. Kartikye Vats completed his B.Tech(I.T) from Jaypee Institute of Information Technology. He worked as a Configuration controller in Infosys till Oct 2008. He is currently working as Flex developer in National Online Examination System.



Microsoft Word - 14Design and Implementation of Adaptive Assessment System Using Item Response Theory Model.doc

10 pages

Report File (DMCA)

Our content is added by our users. We aim to remove reported files within 1 working day. Please use this link to notify us:

Report this file as copyright or inappropriate


You might also be interested in

Microsoft Word - Examinee Handbook
Defense Language Institute Foreign Language Center
Microsoft Word - 14Design and Implementation of Adaptive Assessment System Using Item Response Theory Model.doc
General measures of cognition for the preschool child