The Race to theTop assessment consortia have indicated an interest in using “automated scoring” to more efficiently grade student answers to constructed-response literacy and mathematics tasks.Automated scoring refers to a large collection of grading approaches that differ dramatically depending upon the constructed-response task being posed and the expected answer. Even within a content domain, automated scoring approaches may differ significantly, such that a single characterization of the “state of the art” for that domain would be misleading. This white paper identifies potential uses and challenges around automated scoring to help the consortia make better- informed planning and implementation decisions. If the benefits sought from constructed-response tasks are to be realized, automated scoring must be implemented and used thought- fully; otherwise, automated scoring may have negative effects strikingly similar to those commonly attributed to multiple-choice tests.
One important statement in the paper frames current capabilities, “A machine does not read, understand, or grade a student’s essay in the same way as we like to believe that a human rater would. It simply predicts the human rater’s score.” That is true of current scoring engines, but there is clearly potential for intelligent scoring engines and innovative items that directly measure knowledge and skill and don’t simply seek to predict human scoring. An equally important development, based on a dramatic increase in assessment output and advances in data mining, will be new strategies for comparing big data sets.
Dr. Bennett’s paper, Automated Scoring of Constructed-Response Literacy and Mathematics Items, provides seven recommendations:
1. Design a computer-based assessment as an integrated system in which automated scoring is one in a series of interre- lated parts
2. Encourage vendors to base the development of automated scoring approaches on construct understanding
3. Strengthen operational human scoring
4. Where automated systems are modeled on human scoring, or where agreement with human scores is used as a primary validity criterion, fund studies to better understand the bases upon which humans assign scores
5. Stipulate as a contractual requirement the disclosure by vendors of those automated scoring approaches being considered for operational use
6. Require a broad base of validity evidence similar to that needed to evaluate score meaning for any assessment
7. Unless the validity evidence assembled in Number 6 justifies the sole use of automated scoring, keep well-supervised human raters in the loop
This is sound near term advice on the use of automated scoring in state testing systems. But for a more forward leaning view of the opportunity, see my interview with Cisco’s John Behrens who believes there is no excuse for sucky items. Automated scoring will make it possible for states to administer better tests less expensively. The real advance will be when assessment moves into the background.
For more, see Pearson’s Next Generation Assessment site and resources associated with their five recommended steps for state testing directors and policy makers.