Medical Education and Technology

Test Analysis - Section Descriptions

Test Results
Raw Data The "Raw Data" section is a listing of what the students bubbled in for each question. The printout lists the student responses in numbers (A=1, E=5) as opposed to the letters on the form. This printout is sorted by the last 8 of the student's id number.
Class Analysis Section

The data provided in this section includes a list of students by their I.D. numbers, providing each student's raw score, the number of questions answered, and the percent correct. Another list is provided, giving the students' scores in ranked decreasing order.

The statistical information provided within this section consists of the Z-score, which is a transformation of the raw score to a variable which has a mean of zero and a standard deviation of one. It is useful for indicating the proximity of each student's score to the mean score (is it above or below it, and by how much?). Keep in mind that 66% of all scores will be included within one (1) standard deviation (plus and minus inclusive) of the mean.

Test Statistics Section

This section provides data such as the number of students present, the number of questions included on the test, and the number of points possible. The statistics provided in this section include the following:

Mean Score:
This is the "average" for the test.

Median Score:
This is the score above which 50% of the scores fall. (In a perfect bell curve, the mean and the median are the same score.)

Standard Deviation:
This is an estimate of the dispersion of the test scores about the mean.The value of the standard deviation varies directly with the spread ofthe test scores. If the spread is large, the standard deviation is large. Again, one standard deviation of the mean (both the plus and minus) will include 66% of the students' scores. Two standard deviations will include 95% of the scores.

Reliability Coefficient:
This is a measure of the test's validity. The maximum value of the reliability coefficient is +1.0, meaning perfectly reliable, no variation due to error. It is desirable to have a value of no less than +0.8 for your test in order to be assured of its validity.

This graphically displays (with X's) the distribution of the students' scores. The x-axis represents the percent correct scores, and the y-axis represents the number of students receiving each score. The mean is indicated by the letter "M" on the x-axis.

Item Analysis Section

This section is the most complex, and yet the most useful to the test writer. Each question is analyzed for its validity according to a sophisticated set of criteria. For each question, the question number and correct answer are listed. The percent correct data for the question is given for the overall group, as well as for the upper 27% of the group and the lower 27% of the group. The upper and lower values are used again later to compute the discriminability index, the efficiency index, and the correlation coefficient, which are measures of validity.

The theory for dividing the class into upper and lower scoring groups is that a valid question will have more students from the upper group scoring correctly than the lower group.

In an effort to simplify matters, each component of this section will be listed below with a brief explanation of what it is and how it should be used:

Discriminability Index:
This is an indication of how well an item (question) discriminates between high and low scoring students. To this end, you want a question that is answered correctly by the most knowledgeable students, and answered incorrectly by the least knowledgeable students, The discriminability index ranges form -1.0 to +1.0, where +1.0 is the ideal. For practical purposes, the index should be no less than +0.2 for good items.

Efficiency Index:
This is an adjustment of the discriminability index, and is useful in cases where the item difficulty is extremely high or low. The efficiency index was developed to rescale the discriminability index regardless of the proportion responding correctly. To be classified as an efficient item, the efficiency index must be larger than +0.4. An ideally efficient item falls between +0.8 and +1.0.

Correlation Index:
This represents the extent to which high scorers answer correctly and low scorers answer incorrectly, using all students as opposed to upper-lower groups. That is, it relates the item to the overall test and population. The correlation coefficient ranges from -1.0 to +1.0. A good item should correlate with a value greater than +0.25.

Response Count:
This is a section of the print-out that involves evaluating good and bad distractors. The upper and lower groups and the overall group are used to study the performance of the distractors. Each answer is listed (A, B, C, D, E) and the number of students responding to each answer choice is listed. In the overall group, if no one chooses a distractor, it is implausible and therefore should be replaced. In the upper-lower distribution, a distractor is good if more people from the lower group choose it than people from the upper group. The upper group should perform better than the lower group in all cases.

L-U Indicator:
This serves to point out how the upper and lower groups performed. A (+) in this column means that more lower students than upper students chose the distractor; that's good. A (-) means that more upper students than lower students chose the distractor; that's bad. A (0) means that both groups chose equally - that includes neither group choosing the distractor at all. A (***) means that this is the correct answer.

Mean Total Score:
This is a value presented in the print-out located beneath the answers. It is the mean of the group of students responding to each answer. At least two students must respond to an answer choice to make this a valid statistic. To use this statistic for item analysis regarding each group separately (upper alone or lower alone), the mean total score for those choosing wrong answers should be lower than the mean total scores for those choosing the correct answer. A distractor whose mean total score is not lower than that for the correct answer should be rewritten. A distractor is meant to select the poorer students from within a set group, and this statistic is a means of differentiating within the upper and lower groups.

Test Analysis Histograms:
Included are graphic representations of the following distributions: question discrimination, question efficiency and question correlation.

Item Analysis Summary Table

This appears after the question correlation histogram, and judges each item (question) by these four criteria of failure:

  1. Proportion correct (no more than two respond correctly or incorrectly);
  2. Discriminability index, D < 0.20;
  3. Efficiency index, e < 0.40;
  4. Correlation coefficient, r < 0.25.
If the item fails a criterion, it is noted by an asterisk (*). Then, as a ranking to use these asterisks, the questions are tabulated to see how many asterisks each question has. Once this is determined, they are placed into two categories (under the heading "Summary of Item Statistics").
  1. "To be checked" (two asterisks);
  2. "Revise" (three or more asterisks).
In addition, in this section, each distractor that should be revised is denoted by an asterisk under the "Item Alternatives" heading.

Page 'Breadcrumb' Navigation: