In the work I do, I find little reason to estimate point spreads or to predict scores. This is because those methods are mostly used to try to win in Las Vegas, an objective I have never had. No, my goal is and always has been to figure out how to construct a good team, "basketball engineering," as I've pitched it to a few people.
This objective is carried out by studying how individuals work or by studying how teams work, with the intention that the two approaches will merge and lead to the same conclusions.
On the team side, one of my focuses has been to understand how points scored and points allowed relate to win/loss records: if a team averages 3 points per game more than their opponents, what is their expected winning percentage? I have, in fact, gained numerous insights through the development of the method, which answers questions just like this. This method, in combination with matchup probabilities, also does a good job of prediction and has been used in one public study showing that there is no distinct added home court advantage in the playoffs. Methods like this, however, do not explicitly account for one piece of information that seems important: strength of schedule.
Let me first hedge a little and say: There is actually no hard evidence proving that a 10 point win over a strong opponent means anything more than a 10 point win over a weak opponent. Intuitively, we believe that this must be true and I have little doubt that an empirical study would show that it is true (something that you yourself can do). What I will present here relies on this unproven belief and, for those theoretical types out there, actually proves it if you read carefully and think about it.
In large studies, the strength of opponents balances out or can be factored out in some way. In small studies, the strength of opponents does not balance out, which is a primary motivation for this work. For example, during the 1992-93 season, Michael Jordan missed 4 games and the Bulls went 1-3 in those games. Given that the Bulls were 57-25 on the season, this immediately implies that Jordan is much better than his replacement, B.J. Armstrong. But does it? The Bulls' three losses were all to playoff teams, including one to the Knicks who had the best record in the regular season that year. In addition, none of these losses was by more than 6 points. The Bulls' one win was a 28 point blowout. How do we put all this together to create some picture of the relative value of Armstrong to Jordan? We can do it in our minds, but we wouldn't all agree. Or we can use a mathematical method whose basis we can agree upon.
Let me now take a quick diversion into the benefits of numerical methods whose results are consistent from person to person, not subjective. If you don't want to hear me sound off like Billy Graham after a physics class, I suggest you just leap past my preachings to the rest of the article.
Are you sure you want to read this? It could be worse than Billy Graham on physics. It could be Bill Clinton on health care! It could be Newt Gingrich on ethics! It could be Bob Dole on anything! Last chance. Click here or forever hold your peace...
For many issues, it is fine that we make subjective judgments and that we don't all agree on those judgments. Argument is underrated in this world, as long as it's rational and we don't start killing each other because you think Jordan is 20 points better than Armstrong and I think he's only 10 points better. (No subtle reference to the Middle East intended there.) I make my living off people disagreeing and, no, I am not a lawyer.
But when things have to get done, we have to agree on some basic rules. We have to have a consistent set of methods for characterizing the truth. Usually, if we cannot agree on the big picture, we can start by looking at the details. We may not agree how much better Jordan is than Armstrong, but we can agree that if Armstrong took Jordan's place for four games and the Bulls won all four, that is an indication that Armstrong isn't as bad as Jordan. We should also agree that if all four games were against very weak teams, then that indication isn't as strong. Finally, we should agree that if all four games were against strong teams, then we have reason to wonder what's going on -- Armstrong is beginning to look pretty good.
If we can find a mathematical method that adequately characterizes those details we agree upon, we have made a step towards agreeing upon the big picture. Often, there are several mathematical methods that can characterize agreed-upon details. Sometimes those methods disagree on the big picture. Many times they don't. The more information that they account for, the more likely they are to agree upon the big picture.
Of course, these methods can still be wrong. Some of the best models in environmental engineering agree on many things, but they can't predict real circumstances very well. It frustrates me to no end when people argue over methods whose predictions are all pretty close to one another but that are also all quite far from predicting reality. That's why I am not trying to place the method of this article in competition with other similar ones which use the same information to get similar results. All of the methods have about equal value for doing what we want: taking scores and assigning "ratings" based on those scores, the opponents, and whether the game was at home or on the road. They all make similar predictions. They all eliminate a large part of the subjectivity. Arguing over what is left of the subjectivity within the methods is foolish and left to people who like to call themselves fools.
Now back to our regularly scheduled article ...
On the individual side of my research, I also have never taken explicit account of the strength of opponents. Specifically, Michael Jordan drives and jukes against the toughest defenders every night, just as Joe Dumars tries to contain the toughest offensive players every night. Direct measurements of what they do then show Jordan and Dumars to be actually somewhat worse than they actually are. What I will present here is the skeleton of a method that can account for this bias.
The method I will present here to handle varying strengths of opponents is called a statistical filter. Statistical filters are methods for estimating something using statistics. In this case, the filters can be used to estimate the strength of a team using a team's game-to-game progression of points scored, points allowed, whether they were at home or on the road, and who they played against. There are several methods out there that take only this information to produce rankings and/or predict scores -- Doug Norris used to have one but I can't find him on the web anymore, ESPNet has one, and World Wide Rankings and Ratings (WWRR) has four or five. I believe that Doug's method and the ones from WWRR are "original", meaning that they dreamed them up on their own, a feat for which I applaud them until my hands turn red. However, an "optimal" technique for using this information has been around for a long time. This technique is called a Kalman Filter. Even though I said the Kalman Filter is "optimal", I am not claiming a Kalman Filter is any "better" than anything else. Every paper that has ever been written about the Kalman Filter has stated that it is "optimal", so I'm just regurgitating.
Kalman filters are used by NASA to predict the path of missiles and planes. They are also used to predict weather. They are used on Wall Street. Recently, they were introduced to environmental law by yours truly. A Kalman filter is clearly a very practical tool and it only makes sense that it has applications in basketball. It was actually used in a football prediction program when it was introduced to me about five years ago.
As another illustration of how one might use the Kalman Filter, I present the following chicken scratch:
For this strip, I owe a debt of gratitude to Scott Adams, writer of Dilbert. No, he didn't draw this, nor did he write the dialog. Actually he didn't do diddley except get popular enough so that you could tell what I drew even though I am a lousy artist.
Conceptually, we know that a good offense on average will do relatively better against a poor defense and relatively worse against a good defense. Let's start with that concept and attach some numbers. Last year's Utah Jazz had a good offense on average with an of 111.7 in a league where the average rating was 105.9. They played against the following "good defensive teams" a total of 18 times: Chicago (2), Miami (2), New York (2), Portland (4), San Antonio (4), and Seattle (4). These teams had a weighted average defensive rating of 101.5 (weighted by games played against Utah). The Jazz played against the following "poor defensive teams" a total of 16 times: Charlotte (2), Dallas (4), LA Clippers (4), Milwaukee (2), Philadelphia (2), and Toronto (2). These teams had a weighted average defensive rating of 109.6.
According to my methods, the Jazz offensive rating should have been about 107 against the good defensive teams (their rating dropped) and about 115 against the poor defensive teams. In actuality, the Jazz ratings were 109.3 and 112.0, respectively. The results aren't as good as I had hoped, but this sample was small and I have not done an extensive analysis to determine whether my methods work on larger samples than those here. I believe they will work, but I'd like to ultimately check... unless someone else would like to do it ( Ask me!).
One of the methods says to predict the Jazz offense vs. Team B defense as
(Jazz Off. Rtg) (B Def. Rtg) Jazz Off. Rating vs. Tm B Def. = ------------------------------ (1) League Avg. Rtg.
Try this yourself here:
Technical note: Mathematically, this relationship says that Utah's offensive performance is linearly related to both the average Utah offense and to the opposing defense. "Linearly" means that if the average Jazz offense improves by 10% then the Jazz offense vs. Team B's defense also improves by 10%, not 8% or 50%. I only introduce this because a Kalman Filter is strictly only "optimal" if this relationship is linear. The second method I introduce is not linear. (For people with a statistics background, note that a Kalman Filter is optimal only if offensive ratings and defensive ratings are Gaussian distributed. As I showed in Basketball's Bell Curve, this is also essentially true. The success of the Correlated Gaussian method substantiates this.)
Here is how the Kalman Filter will work for a game where team A plays at team B:
(Points can be used in place of ratings above. I like to use ratings rather than points because they do not fluctuate as much as points. But, in terms of ease of use, points are preferable because no calculation of possessions is necessary. Specifically, we could have used Utah's points per game on the road, Seattle's points per game at home, the league average of points per game, and the final score of the game to replace Utah's road ratings, Seattle's home ratings, the league average rating, and the final ratings of the game.)
As an example of this entire method, let's return to the four Bulls games that Jordan missed where they went 1-3 against fairly tough competition. (Note of 11/16/97: The numbers have been revised below due to a fix in the variance of the predicted rating.)
The above calculations are duplicated below, allowing you to change the two somewhat subjective parameters in the procedure: the variance in the prior estimate of all ratings (which I set to 100) and the variance of the game ratings (which I also set to 100). The variance of the game ratings is quite consistent with my records. The variance of the prior estimates states how sure we are with those estimates; since we are not sure of these estimates due to Jordan's absence, I set these relatively high. Feel free to vary these parameters below to see the effects.
Even though not many games were played, we can already get some idea that the Bulls were not as good without Jordan. The offense went up slightly, but not enough to be certain about, and the defense went down quite a bit. For that season, my numbers had Armstrong's offense being just as efficient as Jordan's, but his defense being considerably worse, so this Kalman result is consistent with that. Overall, these few games indicated that the Bulls' expected winning percentage went from about 0.762 to about , or a loss of an additional games over the course of a season. This seems small to me based on the difference in talent between Jordan and Armstrong, but seems about right given the Bulls' performances in the games he missed, which is the only information the method uses.
This raises the issue of uncertainty. These four games cannot present a perfect picture of the difference between Armstrong and Jordan. Just the noise of basketball -- players getting hurt, teams playing back to back nights, Dennis Rodman "not being interested" -- prevents us from being sure about any rating. The Kalman Filter "knows this" and tells us roughly how sure we should be with the ratings it gives us. With the parameters above, our final variances in the offensive and defensive ratings of the Bulls at home are and , respectively. These have gone down about from our prior estimate, so we feel only somewhat more confident about the estimate than before. But it gives us a foothold for other comparisons we might make between Jordan and Armstrong.
(Technical remark: I could have estimated the prior Bulls' offensive and defensive ratings differently, for instance, by just using those games in which Jordan played. The prior ratings are the 'null hypothesis' we are testing against, as traditional statisticians phrase it; our null hypothesis would then have been that not having Jordan made no difference in the Bulls and we were seeing if we could disprove this hypothesis.)
This Kalman Filter is a powerful tool for evaluating situations where strength of opponents is important. This is actually quite common in basketball, where teams don't play a fully balanced schedule, some teams certainly playing a more difficult schedule than others, even over the course of an entire season. I hope to use it quite a bit, though it is still a little labor intensive for me to implement.
There are a couple weaknesses of the filter that I will mention here at the end. First, the reason I hadn't really introduced it before was because I never saw it as a very good predictor when someone like Jordan was missing from a team. Because the filter looks only at teams, it cannot account for teams that change, like when a player is injured. When people put out team ratings, what are they really measuring if significant players miss a few games? We know that significant players make a difference in our predictions, but methods like this don't explicitly account for those players absence or presence. I took advantage of this "weakness" above by turning it around and using the method to identify the difference between the Bulls with Jordan and the Bulls without Jordan.
A second weakness is also a strength. The Kalman Filter's generality of applicability (to other fields) is great, but it also implies that it doesn't have built in a lot of the details of those fields. I had to build a simple model to "predict" basketball games to use in the Kalman Filter. This simple model is not precisely what happens in basketball; a more complex model may be more accurate, but then it becomes much more difficult to implement in a Kalman Filter.
Finally, this method has the weakness that it says that a team that blows out another team always improves its overall rating. Unless you read Can the Bulls Be Perfect?, you are probably wondering "How is that a weakness?". A recent finding I made in writing that article indicated that a blowout doesn't necessarily make you a better team and can actually imply that you're not as good. This was a very unusual result, but one that I cannot dismiss. I also think that it can be built into the Kalman Filter. The thoughts on how to do that will have to wait a while because they are technical enough that most people won't want to hear them. Besides, this article is long enough.
In trying to end on a positive note, I want to mention that this method holds a key to defensive ratings. Because good defensive players are often assigned to guard the best players, their defensive numbers may not look very good unless we take into account the quality of the players they have to guard. Doug Steele does something like this in his defensive ratings, but he has indicated to me that it is a lot of work. Hopefully, this is an easier way to do it.
A reference on the history of the Kalman Filter is this military page. The military does use Kalman Filters for a lot, so they should know about it.
Another reference for the Kalman Filter is this fairly technical paper by two people from North Carolina. I found this paper to be very useful to refresh my memory on this topic. If you know the Kalman Filter well, this paper is too trivial for you. If you don't know it and are not technically inclined, this paper is probably too advanced, but the example is still pretty good.
Most importantly, I owe thanks to Dick Donald for introducing me to this topic many years ago. Second, I want to thank George Pinder for reviving my need to know this stuff and to one of his students, Graciela Herrera, for helping me to relearn it quickly. Finally, I make a second mention of this University of North Carolina paper, by Welch and Bishop, who did a good job with it. I hope that this work adequately reflects these people's abilities to teach it.