By Chris Lozier

Stop me if you’ve seen this one. Tiger Woods hits a glorious second shot to within 5 feet of the pin on a par 5. His playing partner—for fun, let’s call him Phil Mickelson—pulls his approach long and right and his ball settles about 60 feet from the hole. Phil plays first and his lengthy putt rolls just past the hole from where he quickly taps in. Nice birdie, Phil! Tiger now has an immensely makeable eagle putt which he proceeds to pull badly. He taps in for birdie and he and Phil walk off the green with the same score. They also record the same “putts per green,” a variation of metrics long used by the PGA to measure putting performance.

But wait! That doesn’t tell the whole story, right? Surely, we can all agree that Phil putted better than Tiger on that green. After all, he took the same number of strokes to get the ball in the hole from a much more difficult starting point.

By now, this may be starting to sound familiar to followers of education. There is almost universal agreement that learning challenges are not identical for any two students (putts) and, in turn, that no two teachers (golfers) face identical degrees of difficulty.

The PGA adopts a value-added model.

Some time ago, the PGA recognized that measures of putts per green or round failed to paint a complete picture of how well its players putted. Then, with the advent of a technology that enabled them to collect data on every shot, they created a new metric to account for the unique challenges (in terms of length, at least) of all putts. They came up with Strokes Gained-Putting (SGP), which the PGA describes like this:

Strokes Gained-Putting takes into account putting proficiency from various distances and computes the difference between a player’s performance on every green — the number of strokes needed to hole out — against the performance of the other players for each round. This ultimately shows how many strokes are gained or lost due to putting for a particular round, for a tournament and over the course of a year. 

In our example above, SGP would work like so:

  • Phil took two putts from 60 feet on that green. Even the pros will require three putts to hole out from 60 feet more often than they will require just one, and the same held true in the season prior to our imaginary tournament for which the average putts required to hole out from 60 feet was 2.2. Since Phil two-putted, he gained 0.2 (2.2 – 2.0) strokes on what would have been considered average, or expected.
  • As far as those 5-foot putts that Tiger used to make in his sleep, well, all pros will make that putt more often that they will miss it. In this case, we can imagine that the average across the tour in the prior season was 1.2. Since Tiger required two putts, he lost 0.8 (1.2-2.0).

In terms of SGP, therefore, Phil was a full stroke better than Tiger on that hole alone.

Is it possible that Tiger might be unfairly measured in this instance relative to Phil?

The simple answer to this question is “yes.” What if Phil’s putt was a straight, flat 60-footer and Tiger’s was a steeply downhill left-to-right breaking putt that would have required perfect speed and line to make? What if Phil’s putt was hit much too hard, hit the back of the cup, jumped three inches in the air, and very fortuitously dropped back down to settle next to the cup? Or what if Tiger’s putt, hit with nearly perfect line and speed, did a 180-degree lip out before very unluckily staying out of the cup on its way straight back at him. And what if Phil had already fallen completely out of contention such that his putt meant little compared with Tiger’s 5-footer that might have been required to force a playoff. In other words, the SGP metric does not and cannot account for every variable affecting each putt.

On the other hand, over the course of a tournament or a season, SGP rankings can tell an interesting and helpful story. Over several seasons, SGP will give a very good indication of who the best and worst putters are. Players are not only interested to see where they fall in the SGP rankings, but they may well look to understand what the top ranking putters are doing differently. It will not likely matter that SGP is necessarily imperfect on a green-by-green (student-by-student) basis if it can explain part of the story to a player (superintendent, principal, teacher, parent) and shed some light on where to focus practice, whether to try switching to a belly putter, or perhaps just which peers might be the best source of advice.

Where is the lesson in this for teachers and education policymakers?

Value-added modeling, or VAM, is an analogous method of attempting to gauge teacher effectiveness. Technically, SGP is most similar to a variation of VAM known as Student Growth Percentiles (we’ll distinguish the two as SGPGolf and SGPEd). Like SGPGolf and unlike more traditional VAMs, SGPEd does not attempt to explicitly account for the myriad drivers of results such as peer effects, poverty, gang involvement, or home life. In golf, a list of analogous drivers would include competitive pressure, weather, drug usage, playing partner, and home life. Instead, both SGP metrics assume that with a large enough sample set, something meaningful and valuable can be said about the effects of the driver that is constant. In golf, this is the player. In education, this can be a teacher, a school, or a district.

A growing number of states, districts, and schools are endeavoring to use data to improve learning. Most have started or are starting with the more straightforward numbers: absences, behavioral interventions, grades, to name a few. But just as countless studies identify the teacher as the single most significant driver of learning, the most substantial impact that data will make in education will be from its ability to tell us stories about instruction.

How our use of data and particularly VAM evolves will depend more on policy makers than on the academics doing productive and interesting work around what is the most perfect way to measure teacher effectiveness. We urge policy makers and other stakeholders to consider the value of this data now without waiting (forever) for the academics to reach a consensus. In order to be useful, the data need not pinpoint what teachers are doing well or poorly nor must they strip out the myriad non-instructional factors that can influence learning. Neither must they be used for high stakes decisions, such as counting for half of a teacher’s evaluation. Rather, they may be most useful for their utility in shining a light on areas that merit further inspection. The important statistician George E.P. Box may have put it best when he said, “All models are wrong, but some are useful.”


Chris Lozier is the Chief Operating Officer at Civitas Education Partners.

3 COMMENTS

  1. Amusing story, but not a particulary good metaphor if we are trying to make a good case for or against VAM. A golfer’s relationship to the ball is nothing like a teacher’s relationship to a student. The golfer is 100% responsible for realizing the forces that challenge him/her such as slope and dampness of the green, wind, temperature, sand traps, etc. (even his/her emotional state due to homelife). It is all in the way that he strikes the ball to adjust for these things. However, a teacher cannot make up for all the factors that he/she is faced with. She cannot be expected to adjust for student poverty, student inability to speak English well, and an overcrowded classroom in the same way that a golfer adjusts for his challenges. In golf, these things even out over time, and all golfers in a tournament are more or less faced with the same challenges. Not so with teachers. It doesn’t all even out over time. Some schools districts and some classrooms have advantages year after year that others do not have.

    So taking these factors into account to level the playing field IS necessary when it comes to evaluating teachers. From recent articles that we have read, some seem to think that you need to choose between SGP and VAM. In fact, we need to incorporate them together and refine our imperfect models as best we can as the results of our models inform us from year to year.

  2. Bob – At a minimum, I’m happy you are amused!

    I happen to believe that good teachers are able to adjust for factors that may be beyond their control. I’m not saying they can control or even know all the forces that challenge them, pretty much exactly like a golfer cannot. However, I would argue that both teachers and golfers are 100% responsible for trying to recognize, evaluate, and adjust for these forces–I’m not sure what you mean by “100% responsible for realizing the forces.”

    As for golfers being “more or less faced with the same challenges,” several of these different challenges are precisely what SGP is trying to account for while recognizing that no measure can account for all differences. Perhaps I did not lay that our clearly enough or perhaps you are not a golfer.

    In any case, I do not at all object to VAM or the investment in continuing to refine such models. I also agree that there may be value to using VAM and SGP together.

    I am simply (and constantly) advocating for the employment of the least imperfect measurement of learning and teaching that can be afforded without taking too many dollars out of the classroom. In other words, we can increase our return on investment in such endeavors BOTH by increasing the return AND by managing our investment.

    Thanks for your thoughts.

    Chris

LEAVE A REPLY

Please enter your comment!
Please enter your name here