Existing assessments tend to emphasize multiple choice, or “bubble in”, type of questions because they are easier, more timely, and cheaper to score. Research has shown that these types of questions do not provide an accurate reflection of students’ critical thinking skills. To garner an accurate assessment of these skills, more in-depth questions, where the student provides a written response, are now being more widely adopted. This raises questions about the cost associated with grading constructed response items.
The ASAP Pricing Study, published today, addresses these cost questions. The paper concludes that, “In a higher quality assessment, human scoring of student essays can comprise over 60 percent of the total cost of the assessment. Automating this scoring process holds tremendous promise in making higher quality assessments affordable.” Machine scoring of essays may be 20 to 50 percent as much as human scoring with large volumes (see Exhibit 4 on p35 for details).
Concerns remain regarding the quality of automated scoring but a previous study demonstrated that machine scoring of essays is as accurate as scoring done by trained graders and short answer responses were nearly as accurate.
The paper recommends that rather than designing items and then worrying about scoring that states and consortia “work jointly with the vendor community to develop the type of items that can both assess students’ Deeper Learning skills and be efficiently scored by current vendor machine scoring engines.”