We’re stuck and $365 million may not help. The United States places an unusual degree of importance on the reliability of yearend standardized tests. These tests have been around for 15 years and, because we have so little performance data, we try to use them for a variety of purposes. For many reasons, the tests haven’t improved much. The new barrier is the dual fixation on cost and comparability.
Innovation occurs when markets are efficient—where supply meets demand, where consumers quickly (and often ruthlessly) express preferences, where risk is rewarded with return. Blockages can occur either on the buy or the sell side, but they often slump into complacency together.
In the case of educational testing, we have a set of complicated political problems resulting in weak demand for assessment innovation. The next generation of artificial intelligence will help make better test items faster and cheaper to score. But this is more a political problem than a technical problem.
First and foremost, states are broke and can’t afford to spend more money on testing. What they spend on testing appears to vary from $11.50 for cheap online multiple choice test to more than $50 per student for hand score tests with lots of constructed response.
Most states have made a commitment to work cooperatively on the development of new assessments. All but six states have joined one of two Race to the Top funded consortia. But if that means buying a common test, the cost of these new tests will be bounded by the low end of the range because members of the steering committee that represent states with cheap tests can’t go back to their legislatures with a test that costs three times as much. As a result, intense cost pressure will limit the number and type of constructed response items.
The signatories to an Innosight letter think there’s a better approach. Rather than everyone buying the same cheap test, they advocate for state testing systems that customize assessment strategies by state, encourage innovation, utilize sampling and comparison techniques, but retain sufficient comparability.
We’re still new to using data in education. Hungry for actionable data, we quickly started trying to use state tests for everything: instructional improvement, school accountability, and teacher evaluations. We’ve been relying on a few dozen data points about each student for a lot of important decisions.
Testing politics have not yet been recalibrated for information abundance. With the shift to personal digital learning we’ll soon have thousands of daily data points about every student. With a huge trail of evidence it won’t be necessary to place so much weight on yearend (summative) assessments.
The era of big data will allow us to construct temporary agreements about how to interpret, use and compare assessment results. (They will have to be temporary, because the data will improve every year or two.) States that want rich performance-based feedback will get it. States that are only concerned about cost will get more for less.
I’ve interviewed many of the steering committee members for both consortia in the last three years. In every conversation, I feel the weight of the cost-comparability conundrum. If we are not thoughtful, decisions in the next 12 months will compound our anachronistic fixation on reliability with the new drive for comparability. We’ll end up with 20 states using one test, 25 states using another test, and neither being as good as they could be. We’ll be trapped for at least five years with tests that reinforce the worst practices of the old system.
Following are a set of incomplete and partially informed recommendation but directionally correct solutions to the cost-comparability conundrum.
- Think assessment frameworks not tests
- Host a capabilities demonstration in 60 days to showcase state of the art assessment and online scoring capabilities (inside and outside K-12)
- Build comparability models using NAEP like sampling and correlation strategies
- Correlate games, adaptive assessments, and performance tasks to a Common Core lexile scale and encourage districts/networks to submit standards-based gradebooks as evidence of progress (i.e., a correlation service could verify, with a confidence interval, the correlation of any set of assessments)
- Encourage multiple collaborations: during the transition to personal digital learning, we could use 20 different testing systems not just two (ie., PARCC deployment could be groups of states working together in 8 state assessment collaboratives rather than one test)
- States chould build in a strong waiver process for districts/networks that can show much stronger evidence models
- States should use a component strategies and plan on updating state testing systems at least every other year
Beyond these suggestions, its quite possible that several market shaping strategies could be used to mobilize resources and accelerate innovation. Together, these strategies would give us that data our kids and teachers deserve.