Automated Essay Scoring in a High Stakes Environment

One of our goals here is to begin to create the data abundance mindset in U.S. K-12–prepping for policies and practices informed by big data surrounding anywhere anytime learning. To that end, we like to highlight interesting projects and proposals–and we have a good one today.
Mark Shermis, Dean of Education at the Univeristy of Akron, is an academic advisor to an assessment innovation project that we’re working on and one of the most knowledgable folks we’ve found when it comes to automated scoring. He contributed a chapter to an academic book called Innovative Assessment for the 21st Centurey: Supporting Educational Needs. He defines automated essay scoring (AES) as the evaluation of work via computer-based analysis.
Dr. Shermis laments the fact that US secondary students receive an average of 3 assessment per semester in a writing class. We’d both like to see student writing every day and getting feedback instantly.
Here’s Mark’s proposal:

Rather than administer a high-stakes writing test in the Spring of each year, administer about 15 AES-scored essays throughout the year (it could be more). The electronic portfolio could monitor student progress from the beginning of the year through to the end. Towards the end of the year, average the scores for the last three writing assignments and use it as the accountability meausre for the doman of writing. To keep the process secure, the topics for the last three prompts can be controlled by the state department of education and relased on a strict schedule.

Whether you like the policy proposal or not, AES is beginning to encourage a lot more writing with frequent entries into an electronic gradebook and detailed feedback (e.g., Pearson’s Write to Learn, CTB’s Writing Roadmap).*
In many cases, AES is as accurate as human grading. The Common Core demands of writing to text (e.g., narrative,expository, descriptive) will challenge the current generation of scoring engines. An upcoming demonstration will outline the contours of current capabilities.
Dr. Shermis points to a number of benefits of his proposal:

Integration, not competition, with instruction–assessment that informs instruction.
Students get instant feedback
Reliable picture of student abilities and a big trail of evidence
Realistic expectations of a no-surprises environment

We could add security to the list. It will be harder to tamper with AES results compared to paper and pencil exams. With a big trail of evidence of weekly entries a big change in score (up or down) on a high stakes exam would be easy to spot.
The benefits of instructional tech that can double as a test are numerous. It’s great to have academics like Shermis pushing the boundaries of technology and policy and helping to create the big data future of learning.

* Disclosure: Pearson is an investor in Learn Capital where Tom is a partner

3 Comments

Randy Bennett

10/30/2011

Tom, It's an intriguing idea but when it comes to high-stakes assessments, it's probably not as simple as the posting makes it sound. For discussion of some of the issues, see the Gates Foundation-funded, "Automated Scoring of Constructed-Response Literacy and Mathematics Items" at http://www.acarseries.org/randy_bennett.php

Replies

Tom Vander Ark

Thanks Randy. Posted summary of the report here http://www.gettingsmart.com/gettingsmart-staging/edreformer/2011/10/ets-advice-on-automated-scoring/
Your advice seems appropriate for short term replacement/augmentation of human scoring but I'm curious about where you think AI scoring can push boundaries.
What role will innovative items/activities play (sims, games, experiments)?

11/9/2011

Tom, I wouldn't call it near-term advice, at least not in the context of tests to be used for high-stakes decisions. Regardless of what form those tests take, traditional or not (e.g., simulations, games, experiments), the way the scores are generated matters and evidence to support the validity and fairness of those scores is essential. In that respect, the recommendations below apply in the near-term, and I would argue, in the long-term too.
1. Design a computer-based assessment as an integrated system in which automated scoring is one in a series of interre- lated parts
2. Encourage vendors to base the development of automated scoring approaches on construct understanding
3. Strengthen operational human scoring
4. Where automated systems are modeled on human scoring, or where agreement with human scores is used as a primary validity criterion, fund studies to better understand the bases upon which humans assign scores
5. Stipulate as a contractual requirement the disclosure by vendors of those automated scoring approaches being considered for operational use
6. Require a broad base of validity evidence similar to that needed to evaluate score meaning for any assessment
7. Unless the validity evidence assembled in Number 6 justifies the sole use of automated scoring, keep well-supervised human raters in the loop

Stated another way, games, simulations, and experiments may allow us to measure traditional competencies more effectively and to measure competencies we couldn't measure before. Automated scoring will play a key part in helping with that measurement. But evidence to support the validity and fairness of results, and processes to ensure quality, will continue to be necessary--as long as we use those results to make consequential decisions about individuals and institutions.

Your email address will not be published. All fields are required.

	Beginning	Piloting	Implementing
From Fixed Standards to Competency-Based Mastery	BeginningFrom Fixed Standards to Competency-Based Mastery Beginning	PilotingFrom Fixed Standards to Competency-Based Mastery Piloting	ImplementingFrom Fixed Standards to Competency-Based Mastery Implementing
From One-Size-Fits-All to Adaptive Learning	BeginningFrom One-Size-Fits-All to Adaptive Learning Beginning	PilotingFrom One-Size-Fits-All to Adaptive Learning Piloting	ImplementingFrom One-Size-Fits-All to Adaptive Learning Implementing
From Prescribed Knowledge to Innovation & Inquiry	BeginningFrom Prescribed Knowledge to Innovation & Inquiry Beginning	PilotingFrom Prescribed Knowledge to Innovation & Inquiry Piloting	ImplementingFrom Prescribed Knowledge to Innovation & Inquiry Implementing
From Subjects as Silos to Transdisciplinary, Real-World Learning	BeginningFrom Subjects as Silos to Transdisciplinary, Real-World Learning Beginning	PilotingFrom Subjects as Silos to Transdisciplinary, Real-World Learning Piloting	ImplementingFrom Subjects as Silos to Transdisciplinary, Real-World Learning Implementing
From Schools as Self-Contained to Schools as Embedded in Communities	BeginningFrom Schools as Self-Contained to Schools as Embedded in Communities Beginning	PilotingFrom Schools as Self-Contained to Schools as Embedded in Communities Piloting	ImplementingFrom Schools as Self-Contained to Schools as Embedded in Communities Implementing
From Standardized Testing to Meaningful Assessment	BeginningFrom Standardized Testing to Meaningful Assessment Beginning	PilotingFrom Standardized Testing to Meaningful Assessment Piloting	ImplementingFrom Standardized Testing to Meaningful Assessment Implementing
From Institutional Ownership of Student Data to Student-Controlled Learning Wallets	BeginningFrom Institutional Ownership of Student Data to Student-Controlled Learning Wallets Beginning	PilotingFrom Institutional Ownership of Student Data to Student-Controlled Learning Wallets Piloting	ImplementingFrom Institutional Ownership of Student Data to Student-Controlled Learning Wallets Implementing

Advocacy

Advisory

Topics

Recent Releases

Discover the latest in learning innovations

Related Reading

3 Comments

Randy Bennett

Replies

Tom Vander Ark

Randy Bennett

Randy Bennett

Leave a Comment

Stay on the cutting edge!