Hewlett Foundation Launches Prize for Short Answer Scoring

A $100,000 prize for scoring short answer student responses launches today at 10 a.m. PT on the Kaggle platform.  The first phase of the Automated Student Assessment Prize (ASAP) focused on essay scoring.  This second prize, also sponsored by  The William and Flora Hewlett Foundation, takes on the more difficult challenge of scoring, with a high degree of accuracy, constructed responses of less than 150 words.
The Hewlett Foundation sponsored ASAP to address the longstanding problem of high cost and long turnaround times of current state tests. The goal is to shift testing away from standardized bubble tests to tests that evaluate critical thinking, problem solving and other 21st century skills.
Designed by The Common Pool and managed by Open Education Solutions, ASAP2 will test the capabilities of short answer scoring systems. ASAP aspires to inform key decision makers, who are already considering adopting these systems, by delivering a fair, impartial and open series of trials to test current capabilities and to drive greater awareness when outcomes warrant further consideration. If proven accurate, it is likely that states will incorporate more constructed response items into their end of year tests.  It is also likely that, as student access to technology improves, that online assessment systems will be used on a regular basis across the curriculum.
ASAP Phase 2–The Details. During ASAP2, competitors on the Kaggle platform will be provided access to graded short answer responses and their corresponding prompts, so they can build, train and test scoring engines.  Success depends upon how closely they can align scores to those of human expert graders.  The $100,000 prize purse will include $50,000 for 1st, $ 25,000 for 2nd, $ 15,000 for 3rd, $7,500 for 4th, and $2,500 for 5th place.  Documentation of the winning submissions will be released, under an open license, in hopes of elevating the field of automated assessment.
ASAP2 graded content is selected according to specific characteristics.  On average, each answer is approximately 50 words in length.  Some are more dependent upon source materials than others, and the answers cover a broad range of disciplines (from English Language Arts to Science). The range of answer types is provided to develop a better understand the strengths of specific solutions.
Competitors will score responses to 10 prompts; each comes with a training set of about 1,800 responses.  Following a 2.5 months training period, competitors will be provided with test data that will contain approximately 6,000 new responses (600 per data set), randomly selected for blind evaluation.  Competitors will supply a score for each response.  Competitors will be asked to submit a technical methods paper outlining their specific approach along with any known limitations.
The timeline for ASAP2 is as follows:

  • Monday, June 25, 2012: Launch of Public Competition; Release of Training and Validation Data Sets Train
  • Wednesday, August 29, 2012: Deadline to Submit Final Models
  • Thursday, August 30, 2012: Release of Test Set
  • Wednesday, September 5, 2012: Deadline to Submit Test Set Solutions
  • Thursday, September 6, 2012: Preliminary winners announced
  • Monday, September 17, 2012: Deadline for preliminary winners to open-source models and publish their methods papers
  • Monday, September 24, 2012: Deadline for public to submit objections (regarding cheating through manually labeling the test set, etc.) on competition results
  • Sunday, September 30, 2012: Deadline to address any objects & review committee to pick the best paper
  • Monday, October 1, 2012: Winners announced

Why Online Assessment? Written communication and critical reasoning are critical skills that all students must master particularly with the widespread adoption of Common Core State Standards. The Hewlett Foundation makes grants to support of these skills, which it calls “deeper learning.” They include the mastery of core academic content, critical reasoning and problem solving, working collaboratively, communicating effectively, and learning how to learn independently. With ASAP, Hewlett is appealing to data scientists to help solve an important problem in the field of educational testing.
One of the key roadblocks to teaching and evaluating critical thinking and analytical skills is the expense associated with scoring tests to measure those abilities. For example, tests that require “constructed responses” (i.e., written answers) are useful tools, but they typically are hand scored, commanding considerable time and expense from public agencies.  So, because of those costs, standardized examinations have increasingly been limited to using “bubble tests” (i.e., multiple choice questions) that deny us opportunities to challenge our students with more sophisticated measures of ability.  Recent developments in innovative software to evaluate student written responses and other response types are promising.  And, states are showing increasing interest in them.
ASAP is designed to achieve the following goals:

  • Challenge developers of student assessment systems to demonstrate their current capabilities
  • Reveal the efficacy and cost of alternative scoring systems to support teachers; and
  • Promote the capabilities of effective scoring systems to state departments of education and other key decision makers, when those advantages have been proven to support student and teacher interests.

Phase one began with a vendor demonstration in February to gauge the state of the field.  A study comparing the performance of nine existing testing vendors can be found at http://www.scoreright.org/NCME_2012_Paper3_29_12.pdf.
After the demonstration, an open competition was launched drawing hundreds of submissions from teams around the world.  In May of this year, $100,000 in prizes was rewarded for ASAP Phase One. A British particle physicist and sports enthusiast, a data analyst for the National Weather Service in Washington, D.C., and a graduate student from Germany won the $60,000 first prize in a competition to design innovative software to help teachers and school systems assess their students’ writing.
The goal of phase one was to assess the ability of technology to assist in grading essays included in standardized tests. The contest revealed that software performed extremely well. This will pave the way for states to include more writing on standardized tests, which mostly consist of simple multiple choice questions
Additional phases of ASAP will be launched in the months ahead, using other forms of graded student content.  Hewlett intends to drive innovation in this sector, at a time when state departments of education are working towards adopting new and more sophisticated student assessments.
For more, see these Getting Smart posts:

Tom Vander Ark

Tom Vander Ark is the CEO of Getting Smart. He has written or co-authored more than 50 books and papers including Getting Smart, Smart Cities, Smart Parents, Better Together, The Power of Place and Difference Making. He served as a public school superintendent and the first Executive Director of Education for the Bill & Melinda Gates Foundation.

Discover the latest in learning innovations

Sign up for our weekly newsletter.

0 Comments

Leave a Comment

Your email address will not be published. All fields are required.