Hewlett's Automated Student Assessment Prize

I want kids to write a lot every day. But many high school classes only require three writing assignments in a semester. With a big class load, it’s extremely time consuming for a teacher to grade hundreds of papers each week.

State tests don’t typically demand much writing because it is difficult, expensive and time consuming to score.  When state tests focus on recall rather than writing it reinforces coverage over competence.

Here’s the problem: The standards movement was initially animated by a rich vision of competency-based learning, authentic assessment, performance demonstrations, and student portfolios.  That got expensive and unwieldy fast so we ended up with bubble sheet multiple-choice end of year exam.  And because we haven’t had much data, we’ve tried to use the same cheap tests to improve instruction, evaluate teachers, manage matriculation, and hold school accountable.

Problem number two: Every state has their own standards and tests.  They are not comparable and many of them don’t reflect real college and work ready expectations.  Common Core State Standards and the tests the two state consortia (PARCC and SBAC) are building will go a long way toward solving this second problem—they will set real college and work ready standards and making results more comparable.

The real question is, how good will these new tests be?  Will they reinforce what kids really need to know and do?  Administering them online will make them less expensive.  The results will be available quickly.  But will they better reflect what we want kids to know and be able to do?

It would be easy and cheap to administer online multiple choice tests.  But if we take seriously the demands of the idea economy and the associated expectations of the Common Core, we must to do better.

What if state tests required students to write essays, answer tough questions, and compare difficult passages of literature?  What if tests provided quick feedback quickly?  What if the marginal cost was close to zero?  What if the same capability to provide performance feedback on student writing was available to support every day classroom learning?

The Hewlett Foundation is sponsoring a competition that will demonstrate that automated essay scoring is already pretty good—on most traits it is as good as expert human graders.  This prize competition will make it better.  The reason Hewlett is sponsoring the competition is that they want to promote deeper learning—mastery of core academic content, critical reasoning and problem solving, working collaboratively, communicating effectively and learning how to learn independently.

“Better tests support better learning,” said Barbara Chow, Education Program Director at the Hewlett Foundation.  “Rapid and accurate automated essay scoring will encourage states to include more writing in their state assessments.  And the more we can use essays to assess what students have learned, the greater the likelihood they’ll master important academic content, critical thinking and effective communication.”

The competition kicks off today with a demonstration of capabilities of current testing vendors.  They will spend the next two weeks using almost 14,000 essays (which were gathered from state testing departments) to train their scoring algorithms.  On January 23 they will receive another batch of more than 5,000 essays and will have two days to score them.  The competition hosts will report back to the testing consortia in February with a description of current scoring capabilities.

Also launching today is an open competition with $100,000 in prize money.  Computer scientists worldwide are invited to join this competition with the top three sharing the prize purse.  To give the upstarts more time to attack the data, the open competition runs through April.

The competition has the opportunity to influence the extent to which state tests incorporate authentic assessment rather than rely solely on inexpensive multiple choice items.  To a great extent, state assessments influence the quality and focus of classroom instruction.  This competition has the potential to improve the quality of state assessments and, as a result, classroom instruction in this country for the next decade.

The academic advisor for the competition Mark Shermis notes that “In the area of high-stakes assessment, it will take a few years before the technology is ‘trusted’ enough to make assessment decisions without at least one human grader in place.  Most current high-stakes implementations use one human grader and one machine grader to evaluate an essay.”  Lower stakes tests are likely to just use automated scoring—like entrance exams for medical school, law school, and business school.

In his recent book on the subject, Dr. Shermis suggests that automated essay scoring will give states attractive options including the use a portfolio of scored classroom essays to supplement or replace an end-of-course or end-of-year exam.  His proposal is summarized here.

“Currently the technology can tell you if you are on topic, how much you are on topic, whether you have a writing structure, and whether you are doing a good job on the general mechanics of writing,” said Dr. Shermis.  “It cannot determine whether you have made a good or sufficient argument or whether you have mastered a particular topic.” That would be really intelligent scoring and is probably a few years off.

In addition to better tests and consistently high expectations, the requirements to administer the new tests online will accelerate the transition to personal digital learning.  While boosting computer access for testing, most states and districts will find it logical to shift from print to digital instructional resources.  And most digital content will include embedded assessment.

Online essay scoring will improve the quality of state testing, but the real benefit will be the weekly use in classrooms.  Teachers across the curriculum will be able to assign 1500 words a  week—not 1500 words a semester—and know that students will receive frequent automated feedback as well as the all important and incisive  teacher feedback.

The Hewlett sponsored Automated Student Assessment Prize (ASAP) will help states make informed decisions about testing.  It will also accelerate progress toward a more intelligent education system that benefits teachers and students.


Note: this post was modified on 2/1/12 to indicate that prizes for the open competition will be awarded to the top three participants of the open competition


  1. If such a tool were available (and cheap) when I was a student, I would use it on my work BEFORE I turned it in.

    Somewhere between my second and final drafts of a paper I would submit it to this tool – it would be web-based, right? – and I would see what the automated grader thought of the paper.

    The first rule of all writing is to keep the audience in mind. If this application were grading my paper then this application is my audience – not the teacher, and not any broader (human) audience.

    In school, I used to write really off-beat essays thinking that they would provide my teachers with a refreshing break from the monotony of my classmates’ papers, which, I was sure, were all of a type with only small variations. But I was writing for a human audience. My writing would have been much different – much less imaginative – if I were writing from a computer who does not feel monotony and would not welcome the strange. The computer is likely to mark the paper down for being off-beat in its topic or style.

    Do you have any concern that training students to write for this application will deliver a generation of students who know how to write for computers? Do you have any concern that an application would punish the truly creative and innovative?

    • Great questions Charlie
      There are a couple of good products on the market how (Write to Learn from Pearson, and Writing Roadmap from CTB) that provide pretty good feedback. They could be used formatively, as you describe, like spell check. A teacher could also require a piece of every week and the auto engines could deposit a grade in the grade book to be spot checked by the teacher.
      Yes, there are potential unintended consequences of any practice. Even as these scoring engines get better, they won’t replace the coaching and feedback from a teacher–it just may just allow teachers to spend more time working with students armed with more data.

  2. I agree with Charlie. Writing assessments should be used as tools to support student writing before, not after, handing in work. More importantly why do we need to spend billions on testing companies to assess student work. Isn’t that what teachers are supposed to do?

  3. I want to build even further on Charlie’s point. Our focus in education should be on supporting students to become better writers (or scientists, or historians). These tools we have become so excited about using to evaluate students really should be tools to empower students to work independently and evaluate themselves. Furthermore, we needn’t waste more money on these testing companies. These tools should and can be open source like the one featured here (http://theinnovativeeducator.blogspot.com/2012/01/get-on-demand-support-for-your-writing.html)

    When we shift our focus from testing and move it toward providing meaningful feedback for students with tools that are not just used for school data, but also for life success, we get closer to making learning meaningful and relevant. The reality when it comes to subjects like writing is that one teacher with 30 students is not necessarily the best way for students to get good at writing. Let them write for real audiences. Get real feedback and have that, not writing for a test prompt become how we assess students.

    • Agree that peer review and audience review (e.g., CES demonstrations of learning) are great and should be part of regular feedback. I like the way NYCiSchool connects students to ‘clients’ for rich background, application, and feedback.

      As noted below, we’ll probably see auto essay scoring embedded in word process tools like a better grammar check.

      I think a lot of this will be facilitated by the shift to individual progress models supported by competency tracking, which David Coleman described as a collection of personal bests (http://www.gettingsmart.com/gettingsmart-staging/blog/2011/07/assessment-as-portfolio-of-personal-bests/). I like that idea–a frequently defended digital portfolio that guides progress.


Please enter your comment!
Please enter your name here