HomeWHICHWhich Of The Following Is True Of Computerized Adaptive Testing

Which Of The Following Is True Of Computerized Adaptive Testing

Computerized adaptive testing is an AI-based approach to assessment where the test is personalized based on your performance as you take the test, making the test shorter, more accurate, more secure, more engaging, and fairer. If you do well, the items get more difficult, and if you do poorly, the items get easier. If an accurate score is reached, the test stops early. The AI algorithms are almost always based on item response theory (IRT), an application of machine learning to assessment, but can be based on other models as well.

Prefer to learn by doing? Request a free account in FastTest, our powerful adaptive testing platform.

What is computerized adaptive testing (CAT)?

Computerized adaptive testing, sometimes called computer-adaptive testing, adaptive assessment, or adaptive testing, is an algorithm that personalizes how an assessment is delivered to each examinee. It is coded into a software platform, using the machine-learning approach of IRT to select items and score examinees. The algorithm proceeds in a loop until the test is complete. This makes the test smarter, shorter, fairer, and more precise.

computerized Adaptive testing options

The steps in the diagram above are adapted from Kingsbury and Weiss (1984). based on these components.

Components of a CAT

  1. Item bank calibrated with IRT
  2. Starting point (theta level before someone answers an item)
  3. Item selection algorithm (usually maximum Fisher information)
  4. Scoring method (e.g., maximum likelihood)
  5. Termination criterion (stop the test at 50 items, or when standard error is less than 0.30? Both?)

How the components work

For starters, you need an item bank that has been calibrated with a relevant psychometric or machine learning model. That is, you can’t just write a few items and subjectively rank them as Easy, Medium, or Hard difficulty. That’s an easy way to get sued. Instead, you need to write a large number of items (rule of thumb is 3x your intended test length) and then pilot them on a representative sample of examinees. The sample must be large enough to support the psychometric model you choose, and can range from 100 to 1000. You then need to perform simulation research – more on that later.

Refer to more articles:  Which Way To Turn Tie Rod For Toe In

computerized adaptive testing

Once you have an item bank ready, here is how the computerized adaptive testing algorithm works for a student that sits down to take the test, with options for how to do so.

  1. Starting point: there are three option to select the starting score, which psychometricians call theta
    1. Everyone gets the same value, like 0.0 (average, in the case of non-Rasch models)
    2. Randomized within a range, to help test security and item exposure
    3. Predicted value, perhaps from external data, or from a previous exam
  2. Select item
    1. Find the item in the bank that has the highest information value
    2. Often, you need to balance this with practical constraints such as Item Exposure or Content Balancing
  3. Score the examinee
    1. Usually IRT, maximum likelihood or Bayes modal
  4. Evaluate termination criterion: using a predefined rule supported by your simulation research
    1. Is a certain level of precision reached, such as a standard error of measurement <0.30?
    2. Are there no good items left in the bank?
    3. Has a time limit been reached?
    4. Has a Max Items limit been reached?

The algorithm works by looping through 2-3-4 until the termination criterion is satisfied.

How does the test adapt? By Difficulty or Quantity?

CATs operate by adapting both the difficulty and quantity of items seen by each examinee.

DifficultyMost characterizations of computerized adaptive testing focus on how item difficulty is matched to examinee ability. High-ability examinees receive more difficult items, while low ability examinees receive easier items, which has important benefits to the student and the organization. An adaptive test typically begins by delivering an item of medium difficulty; if you get it correct, you get a tougher item, and if you get it incorrect, you get an easier item. This pattern continues.

Quantity: Fixed-Length vs. Variable-LengthA less publicized facet of adaptation is the number of items. Adaptive tests can be designed to stop when certain psychometric criteria are reached, such as a specific level of score precision. Some examinees finish very quickly with few items, so that adaptive tests are typically about half as many questions as a regular test, with at least as much accuracy. Since some examinees have longer tests, these adaptive tests are referred to as variable-length. Obviously, this makes for a massive benefit: cutting testing time in half, on average, can substantially decrease testing costs.

Some adaptive tests use a fixed length, and only adapt item difficulty. This is merely for public relations issues, namely the inconvenience of dealing with examinees who feel they were unfairly treated by the CAT, even though it is arguably more fair and valid than conventional tests. In general, it is best practice to meld the two: allow test length to be shorter or longer, but put caps on either end that prevent inadvertently too-short tests or tests that could potentially go on to 400 items. For example, the NCLEX has a minimum length exam of 75 items and the maximum length exam of 145 items.

Refer to more articles:  How To Tell Which Trim My Car Is

Example of the computerized adaptive testing algorithm

Let’s walk through an oversimplified example. Here, we have an item bank with 5 questions. We will start with an item of average difficulty, and answer as would a student of below-average difficulty.

Below are the item information functions for five items in a bank. Let’s suppose the starting theta is 0.0.

item information functions

  1. We find the first item to deliver. Which item has the highest information at 0.0? It is Item 4.
  2. Suppose the student answers incorrectly.
  3. We run the IRT scoring algorithm, and suppose the score is -2.0.
  4. Check the termination criterion; we certainly aren’t done yet, after 1 item.
  5. Find the next item. Which has the highest information at -2.0? Item 2.
  6. Suppose the student answers correctly.
  7. We run the IRT scoring algorithm, and suppose the score is -0.8.
  8. Evaluate termination criterion; not done yet.
  9. Find the next item. Item 2 is the highest at -0.8 but we already used it. Item 4 is next best, but we already used it. So the next best is Item 1.
  10. Item 1 is very easy, so the student gets it correct.
  11. New score is -0.2.
  12. Best remaining item at -0.2 is Item 3.
  13. Suppose the student gets it incorrect.
  14. New score is perhaps -0.4.
  15. Evaluate termination criterion. Suppose that the test has a max of 3 items, an extremely simple criterion. We have met it. The test is now done and automatically submitted.

Advantages of computerized adaptive testing

How to develop an adaptive assessment that is valid and defensible

CATs are the future of assessment. They operate by adapting both the difficulty and number of items to each individual examinee. The development of an adaptive test is no small feat, and requires five steps integrating the expertise of test content developers, software engineers, and psychometricians.

The development of a quality adaptive test is complex and requires experienced psychometricians in both item response theory (IRT) calibration and CAT simulation research. FastTest can provide you the psychometrician and software; if you provide test items and pilot data, we can help you quickly publish an adaptive version of your test.

Refer to more articles:  Which Pfds Would Be Considered Readily Accessible

Step 1: Feasibility, applicability, and planning studies. First, extensive monte carlo simulation research must occur, and the results formulated as business cases, to evaluate whether adaptive testing is feasible, applicable, or even possible.

Step 2: Develop item bank. An item bank must be developed to meet the specifications recommended by Step 1.

Step 3: Pretest and calibrate item bank. Items must be pilot tested on 200-1000 examinees (depends on IRT model) and analyzed by a Ph.D. psychometrician.

Step 4: Determine specifications for final CAT. Data from Step 3 is analyzed to evaluate CAT specifications and determine most efficient algorithms using CAT simulation software such as CATSim.

Step 5: Publish live CAT. The adaptive test is published in a testing engine capable of fully adaptive tests based on IRT. There are not very many of them out in the market. Sign up for a free account in our platform FastTest and try for yourself!

Want to learn more about our one-of-a-kind model? Click here to read the seminal article by our two co-founders. More adaptive testing research is available here.

Minimum requirements for computerized adaptive testing

Here are some minimum requirements to evaluate if you are considering a move to the CAT approach.

  • A large item bank piloted so that each item has at least 100 valid responses (Rasch model) or 500 (3PL model)
  • 500 examinees per year
  • Specialized IRT calibration and CAT simulation software like Xcalibre and CATsim.
  • Staff with a Ph.D. in psychometrics or an equivalent level of experience. Or, leverage our internationally recognized expertise in the field.
  • Items (questions) that can be scored objectively correct/incorrect in real-time
  • An item banking system and CAT delivery platform
  • Financial resources: Because it is so complex, the development of a CAT will cost at least $10,000 (USD) — but if you are testing large volumes of examinees, it will be a significantly positive investment. If you pay $20/hour for proctoring seats and cut a test from 2 hours to 1 hour for just 1,000 examinees… that’s a $20,000 savings. If you are doing 200,000 exams? That is $4,000,000 in seat time that is saved.

Adaptive testing: Resources for further reading

Visit the links below to learn more about adaptive assessment.

  • We first recommend that you first read this landmark article by our co-founders.
  • Read this article on producing better measurements with CAT from Prof. David J. Weiss.
  • International Association for Computerized Adaptive Testing: www.iacat.org
  • Here is a video on the history of CAT, by the godfather of CAT, Prof. David J. Weiss

Quick FAQ

Let’s start with some quick FAQ. Afterwards, we will delve into details about the machine learning algorithm.

How to implement CAT

Our revolutionary platform, FastTest, makes it easy to publish a CAT. Once you upload your item texts and the IRT parameters, you can choose whatever options you please for steps 2-3-4 of the algorithm, simply by clicking on elements in our easy-to-use interface.

Contact us to sign up for a free account in our industry-leading CAT platform or to discuss with one of our PhD psychometricians.

RELATED ARTICLES

Most Popular

Recent Comments