By Chris Lozier
Stop me if you’ve seen this one. Tiger Woods hits a glorious second shot to within 5 feet of the pin on a par 5. His playing partner—for fun, let’s call him Phil Mickelson—pulls his approach long and right and his ball settles about 60 feet from the hole. Phil plays first and his lengthy putt rolls just past the hole from where he quickly taps in. Nice birdie, Phil! Tiger now has an immensely makeable eagle putt which he proceeds to pull badly. He taps in for birdie and he and Phil walk off the green with the same score. They also record the same “putts per green,” a variation of metrics long used by the PGA to measure putting performance.
But wait! That doesn’t tell the whole story, right? Surely, we can all agree that Phil putted better than Tiger on that green. After all, he took the same number of strokes to get the ball in the hole from a much more difficult starting point.
By now, this may be starting to sound familiar to followers of education. There is almost universal agreement that learning challenges are not identical for any two students (putts) and, in turn, that no two teachers (golfers) face identical degrees of difficulty.
The PGA adopts a value-added model.
Some time ago, the PGA recognized that measures of putts per green or round failed to paint a complete picture of how well its players putted. Then, with the advent of a technology that enabled them to collect data on every shot, they created a new metric to account for the unique challenges (in terms of length, at least) of all putts. They came up with Strokes Gained-Putting (SGP), which the PGA describes like this:
Strokes Gained-Putting takes into account putting proficiency from various distances and computes the difference between a player’s performance on every green — the number of strokes needed to hole out — against the performance of the other players for each round. This ultimately shows how many strokes are gained or lost due to putting for a particular round, for a tournament and over the course of a year.
In our example above, SGP would work like so:
- Phil took two putts from 60 feet on that green. Even the pros will require three putts to hole out from 60 feet more often than they will require just one, and the same held true in the season prior to our imaginary tournament for which the average putts required to hole out from 60 feet was 2.2. Since Phil two-putted, he gained 0.2 (2.2 – 2.0) strokes on what would have been considered average, or expected.
- As far as those 5-foot putts that Tiger used to make in his sleep, well, all pros will make that putt more often that they will miss it. In this case, we can imagine that the average across the tour in the prior season was 1.2. Since Tiger required two putts, he lost 0.8 (1.2-2.0).
In terms of SGP, therefore, Phil was a full stroke better than Tiger on that hole alone.
Is it possible that Tiger might be unfairly measured in this instance relative to Phil?
The simple answer to this question is “yes.” What if Phil’s putt was a straight, flat 60-footer and Tiger’s was a steeply downhill left-to-right breaking putt that would have required perfect speed and line to make? What if Phil’s putt was hit much too hard, hit the back of the cup, jumped three inches in the air, and very fortuitously dropped back down to settle next to the cup? Or what if Tiger’s putt, hit with nearly perfect line and speed, did a 180-degree lip out before very unluckily staying out of the cup on its way straight back at him. And what if Phil had already fallen completely out of contention such that his putt meant little compared with Tiger’s 5-footer that might have been required to force a playoff. In other words, the SGP metric does not and cannot account for every variable affecting each putt.
On the other hand, over the course of a tournament or a season, SGP rankings can tell an interesting and helpful story. Over several seasons, SGP will give a very good indication of who the best and worst putters are. Players are not only interested to see where they fall in the SGP rankings, but they may well look to understand what the top ranking putters are doing differently. It will not likely matter that SGP is necessarily imperfect on a green-by-green (student-by-student) basis if it can explain part of the story to a player (superintendent, principal, teacher, parent) and shed some light on where to focus practice, whether to try switching to a belly putter, or perhaps just which peers might be the best source of advice.
Where is the lesson in this for teachers and education policymakers?
Value-added modeling, or VAM, is an analogous method of attempting to gauge teacher effectiveness. Technically, SGP is most similar to a variation of VAM known as Student Growth Percentiles (we’ll distinguish the two as SGPGolf and SGPEd). Like SGPGolf and unlike more traditional VAMs, SGPEd does not attempt to explicitly account for the myriad drivers of results such as peer effects, poverty, gang involvement, or home life. In golf, a list of analogous drivers would include competitive pressure, weather, drug usage, playing partner, and home life. Instead, both SGP metrics assume that with a large enough sample set, something meaningful and valuable can be said about the effects of the driver that is constant. In golf, this is the player. In education, this can be a teacher, a school, or a district.
A growing number of states, districts, and schools are endeavoring to use data to improve learning. Most have started or are starting with the more straightforward numbers: absences, behavioral interventions, grades, to name a few. But just as countless studies identify the teacher as the single most significant driver of learning, the most substantial impact that data will make in education will be from its ability to tell us stories about instruction.
How our use of data and particularly VAM evolves will depend more on policy makers than on the academics doing productive and interesting work around what is the most perfect way to measure teacher effectiveness. We urge policy makers and other stakeholders to consider the value of this data now without waiting (forever) for the academics to reach a consensus. In order to be useful, the data need not pinpoint what teachers are doing well or poorly nor must they strip out the myriad non-instructional factors that can influence learning. Neither must they be used for high stakes decisions, such as counting for half of a teacher’s evaluation. Rather, they may be most useful for their utility in shining a light on areas that merit further inspection. The important statistician George E.P. Box may have put it best when he said, “All models are wrong, but some are useful.”
Chris Lozier is the Chief Operating Officer at Civitas Education Partners.