Indianapolis, IN – The William and Flora Hewlett Foundation awarded $100,000 today in a competition to develop innovative software to help teachers score student written responses to test questions. The prize was divided among five (5) teams. The competition compared the ability of software to score short-answer student essays in a way that was similar to human graders. The results showed that the software is not yet able to achieve the same scores as human graders.
“Giving school systems the tools to challenge students to develop critical reasoning skills is crucial to making those students competitive in the new century,” said Barbara Chow, Education Program Director at the Hewlett Foundation. “And critical reasoning is one of the capabilities, along with communicating clearly, working cooperatively, and learning independently, that we call Deeper Learning would like to see broadly embraced throughout the country.”
The Hewlett Foundation sponsored the Automated Student Assessment Prize (ASAP) to address the need for high quality standardized tests to replace many of the current ones, which test rote skills. The goal is to shift testing away from standardized bubble tests to tests that evaluate critical thinking, problem solving and other 21st century skills. To do so, it’s necessary to develop more sophisticated tests to evaluate these skills and reduce their cost so they can be adopted widely. Computer aided scoring can play an important role in achieving this goal.
Participants in the competition had access to more than 27,000 hand-scored short-answer responses that varied in length, type and grading protocols. They were challenged to develop software designed to faithfully replicate the assessments of trained expert educators using multiple rubrics. The systems do not independently assess the merits of a response; instead, they predict how a person would have scored the response under optimal conditions.
“We knew from the beginning that the linguistic challenges would be greater for scoring the short-answer constructed responses than for the essay competition,” said Mark Shermis, the study’s principal investigator and a professor in the College of Education at The University of Akron. “However, we were impressed with the range and types of questions under which the scoring engines worked. With additional development, these engines might eventually be used as inexpensive second-readers or as screeners for off-topic responses.”
187 participants across 150 teams tackled the incredibly difficult challenge of developing new software that can score short-answer responses to questions on state standardized tests. Competing teams developed their systems over three months and shared their technical approaches through an active discussion board. Documentation of the winning submissions will be released, under an open license, to enable others to build on this competition’s success and advance the field of automated assessment.
“I am excited to win this contest because it gave me an opportunity to create a futuristic program that reads an essay, finds the answers being asked, and scores it as a human would do,” said Luis Tandalla, the Ecuadorian university student studying mechanical engineering who placed first. “I’m hopeful that my model will help advance the field of scoring software so that computers can assist teachers, who can then use the results to provide even more individualized instruction to their students.”
This is the second competition hosted by the Automated Student Assessment Prize (ASAP), which seeks to inform states as they explore using computers to grade new high-stakes tests. “If we can demonstrate, under fair and impartial conditions, that computers can grade written answers to test questions, then states can move beyond multiple-choice and afford measures of deeper learning,” says Jaison Morgan, Co-Director of ASAP.
The competition drew more than 1,800 entries, including those from two commercial vendors. Since the advent of ASAP nearly a year ago, it has inspired participants to develop innovative and accurate ways to improve on currently available scoring technologies. For this competition, Measurement Incorporated, a company that provides achievement tests and scoring services, partnered with the third place team from the first competition, allowing them to outperform all other teams.
“The competition showed that software used to score short answers has great potential. Use of these systems today could supplement and support the work of trained experts,” said Tom Vander Ark, Co-Director of ASAP, “Computers can be implemented to validate – not to replace – the work of teachers, lowering costs for school districts and offering better tests that can be graded faster and less expensively.”
ASAP was hosted on Kaggle (www.kaggle.com), the leading platform for predictive modeling competitions. Kaggle’s platform helps companies, governments, and researchers identify solutions to some of the world’s hardest problems by posting them as competitions to a community of more than 53,000 data scientists located around the world. Many of the competing teams have no direct connection to educational assessment and used a variety of data-driven approaches from multiple disciplines. These approaches proved highly effective at a time when innovation in machine learning is increasingly necessary.
ASAP aspires to inform key decision makers, who are already considering adopting these systems, by delivering a fair, impartial and open series of trials to test current capabilities and to drive greater awareness when outcomes warrant further consideration. If proven accurate and affordable, states can use the systems to incorporate more questions that require written responses into standardized tests. And, by making it easier and more cost effective to include those prompts on standardized tests, experts predict that students will spend more time writing and learning to write in the classroom.
In May of this year ASAP announced the outcome of a similar prize competition, to demonstrate whether computers are capable of grading student written essays. The results were presented at highly credible conferences across the country and triggered unprecedented interest in this low-cost and efficient method. Statistical measures developed for the competition have been widely cited (see http://www.scoreright.org/NCME_2012_Paper3_29_12.pdf). ASAP plans to launch other competitions to build on the discoveries introduced this year.
ASAP was designed by The Common Pool and managed by Open Education Solutions.
The 187 participants in the competition reside in countries from around the world and work in diverse occupations. Competitors scored more than 22,000 responses to ten prompts from three different states. On average, each answer was approximately 50 words in length. Some responses were more dependent upon source materials than others, and the answers cover a broad range of disciplines (from English Language Arts to Science). The range of answer types was provided to develop a better understand of the strengths of specific solutions. Technical methods papers, outlining the winners’ specific approach along with any known limitations were created and will be released to the public.
- Luis Tandalla, 1st place – Originally, from Quito, Ecuador, Luis is currently a college student at the University of New Orleans, Louisiana, majoring in mechanical engineering. A newcomer to data science, Luis’s first experience was one year ago when he took a Machine Learning Course from Dr. Andrew Ng. Luis also participated as part of a team in phase 1 of ASAP, placing 13th.
- Jure Zbontar, 2nd place – Jure lives and works in Ljubljana, Slovenia, where he is a teaching assistant at the Faculty of Computer and Information Science. He’s pursuing a PhD in computer science in the field of machine learning. Besides spending time behind his computer, he also enjoys rock climbing and curling.
- Xavier Conort, 3rd place – A French-born actuary, Xavier runs a consultancy in Singapore. Before becoming a data scientist enthusiast, Xavier held different roles (actuary, CFO, risk manager) in the life and non-life insurance industry in France, Brazil and China. Xavier holds two masters’ degrees and is a Chartered Enterprise Risk Analyst.
- James Jesensky, 4th place – With more than 20 years’ experience as a software developer, James current works in the field of financial services near Pittsburgh, PA. He enjoys these competitions because they allow him to combine his computer science expertise with his life-long love of recreational mathematics.
The fifth place team is an international duo of data experts. Members include:
- Jonathan Peters, 5th place – Based in the United Kingdom, Jonathan works for the National Health Service as a public health analyst. He spends most of his time modeling death and disease; Kaggle competitions offer some light relief.
- Paweł Jankiewicz, 5th place – Paweł lives in Poland and works as a banking reporting specialist. His machine learning experience began when he attended Dr. Andrew Ng’s online Machine Learning class in 2011. Apart from Kaggle, he enjoys English audiobooks, especially the “Wheel of Time” series.