Better Tests, More Writing, Deeper Learning
The Hewlett Foundation funded Automated Student Assessment Prize (ASAP) released about 2300 essays to the 140 contestants this week. They have until the end of the month to score the essays with the highest possible agreement with expert graders. The top three competitors will share $100,000 prize purse to be awarded May 9 in Washington DC.
In a recent private vendor demonstration, nine testing companies showed that “Machine scoring engines did as well or better than the human graders,” as reported by Dr. Mark Shermis, author of the study summarizing the demonstration, Contrasting State-of-the-Art Automated Scoring of Essays. Shermis is the Dean of Education at the University of Akron, leading authority on online assessment, and author of Classroom Assessment in Action.
Shermis did a KPCC radio interview yesterday where he said the study “demonstrates to the two Race-to-the Top consortia, PARCC and Smarter Balanced, that machines have a roll in future assessments.” I echoed Shermis’ comments in a KCBS-AM Radio interview today.
Dr. Shermis was on the radio because Michael Winerup, widely recognized as the worst education reporter in the country, wrote a ridiculous NYTimes piece quoting a critic of automated scoring. Les Perelman, MIT, has tried one essay grader and thinks he can game the system. He worries about “dumbing down American education.” The fact that so little writing is assigned in American schools would suggest this is already happening.
Open Education Solutions is managing ASAP because we share Hewlett’s interest in seeing students write more in American classrooms. Automated essay scoring will allow teachers to assign 1500 words a week instead of 1500 words a semester. With the help of an essay grader, students get frequent standards-based feedback on key writing traits while teachers are able to focus on voice, narrative, character development, or the logic of an argument.
Perelman said scoring engines just measure sentence and essay length. Shermis said many engine did not use length as a predictor and in fact all of the scoring engines did a much better job than simply using length as a predictor. The public leaderboard shows that more than 100 of the competitors beat a simple word count metric.
On gaming the system, Shermis added that “these engines are very sophisticated—40 or 50 variables are operationalized. If a “student had mastered 50 variables, they would be good writers.”
After Shermis destroyed each of the critic’s arguments, Katie, a trained essay grader for several states, called in to the show. While she was initially skeptical, she is “totally convinced they are comparable.” She continued, “I was so impressed with [computer scoring]. It scored vocabulary and sentence structure. It is a tool that can be used use efficiently and well.”
Shermis got to the point, “To be a better writer, you need a lot of practice, you simply have to write more. “ We think if students have to write more on state tests, they will write more in classrooms. When every teacher has an automated scoring engine and every student has an access device, we’ll see a dramatic increase in writing across the curriculum in American schools.
Laura, a university instructor, called in and said, “I use a TA to pre-read essays. A computer system could be used in the same way—as a prescreening tool.” Dr. Shermis agreed that automated scoring is designed to be a “teachers helper” as Laura suggested and would allow a teacher to make more writing assignments.
Perelman continues to claim that Microsoft Word is better than essay scoring engines—a ridiculous claim. The nine algorithms tested are several generations beyond Word grammar check capabilities. However, it is likely that powerful scoring engines will soon be as common as spellcheckers.
Next up in the ASAP sequence is short answer questions—a more difficult challenge than longer essays. The result will be better tests, more writing and deeper learning for American students.
This blog first appeared on Huffington Post.
Wish this had been on NPR. I had only heard that length is the primary measurement and the system can be gamed. It seemed extremely biased and didn't have interviews with the other side.
Leave a Comment
Your email address will not be published. All fields are required.