In the work I do, I find little reason to estimate point spreads
or to predict scores. This is because those methods are mostly
used to try to win in Las Vegas, an objective I have never had. No, my goal is
and always has
been to figure out how to construct a good team, "basketball engineering,"
as I've pitched it to a few people.
This objective is carried out by studying how individuals work
or by studying how teams work, with the intention that the two approaches will
merge and lead to the same conclusions.
On the team side, one of my focuses has
been to understand how points scored and points allowed relate to win/loss records:
if a team averages 3 points per game more than their opponents, what
is their expected winning percentage?
I have, in fact, gained numerous insights through the development of the
method, which answers questions just like this. This method, in combination
with matchup probabilities,
also does a good job of prediction and has been
used in one public study showing that there is no distinct added home
court advantage in the playoffs.
Methods like this, however, do not explicitly account for one
piece of information that
seems important: strength of schedule.
Let me first hedge a little and say:
There is actually no hard evidence proving that a 10 point
win over a strong opponent means anything more than a 10 point win over a weak
opponent. Intuitively, we believe that this must be true and I have little
doubt that an empirical study would show that it is true (something that you yourself
can do). What I will present here relies on this unproven
belief and, for those theoretical types out there, actually proves
it if you read carefully and think about it.
In large studies, the strength of opponents balances out or
can be factored
out in some way. In small studies, the strength of opponents
does not balance out, which is a primary motivation for this work.
For example, during the 1992-93 season, Michael Jordan missed 4 games and the Bulls
went 1-3 in those games. Given that the Bulls were 57-25 on the
season, this immediately implies that Jordan is much better
than his replacement, B.J. Armstrong. But does it? The Bulls' three losses
were all to playoff teams, including one to the Knicks who had the best
record in the regular season that year. In addition, none of
these losses was by more than 6 points. The Bulls' one win was a 28 point
blowout. How do we put all this together to create some picture
of the relative value of Armstrong to Jordan? We can do it in our
minds, but we wouldn't all agree. Or we can use a mathematical
method whose basis we can agree upon.
Let me now take a quick
diversion into the benefits of numerical methods whose results
are consistent from person to person, not subjective. If you don't
want to hear me sound off like Billy Graham after a physics class,
I suggest you just leap past my preachings to the
rest of the article.
Are you sure you want to read this? It could be worse than
Billy Graham on physics. It could be Bill Clinton on health care!
It could be Newt Gingrich on ethics! It could be Bob Dole on anything!
Last chance. Click here or forever hold your peace...
For many issues, it is fine that we make subjective judgments and
don't all agree on those judgments. Argument is underrated in this world, as long
as it's rational and we don't start killing each other because
you think Jordan is 20 points better than Armstrong and
I think he's only 10 points better. (No subtle reference to the
Middle East intended there.) I make my living off people
disagreeing and, no, I am not a lawyer.
But when things have to get done, we have to agree on
some basic rules. We have to have a consistent set of methods for
characterizing the truth. Usually, if we cannot agree on the big
picture, we can start by looking at the details. We may not agree
how much better Jordan is than Armstrong, but we can agree that
if Armstrong took Jordan's place for four games and the Bulls won
all four, that is an indication that Armstrong isn't as bad
as Jordan. We should also agree that if all four games were against
very weak teams, then
that indication isn't as strong. Finally, we should agree that if
all four games were against
strong teams, then we have reason to wonder what's going on -- Armstrong
is beginning to look pretty good.
If we can find a mathematical method that adequately characterizes
those details we agree upon, we have made a step towards
agreeing upon the big picture. Often, there are several mathematical
methods that can characterize agreed-upon details. Sometimes those
methods disagree on the big picture. Many times they don't.
The more information that they account for, the more likely they are
to agree upon the big picture.
Of course, these methods can still be wrong. Some of the best models
in environmental engineering agree on many things, but they can't
predict real circumstances very well. It frustrates me to no end
when people argue over methods whose predictions are all pretty
close to one another but that are also all quite far
from predicting reality. That's why I am not trying to
place the method of this article in competition with other similar
ones which use the same information to get similar results.
All of the methods have about equal value for doing what
we want: taking scores and assigning "ratings" based on
those scores, the opponents, and whether the game was at home or
on the road. They all make similar predictions. They all
eliminate a large part of the subjectivity. Arguing over
what is left of the subjectivity within the methods is foolish
and left to people who like to call themselves fools.
Now back to our regularly scheduled article ...
On the individual side of my research,
I also have never taken
explicit account of the strength of opponents. Specifically, Michael
Jordan drives and jukes against the toughest defenders every night, just as
Joe Dumars tries to contain the toughest offensive players every night. Direct
measurements of what they do then show Jordan and Dumars to be actually
somewhat worse than they actually are. What
I will present here is the skeleton of a method that can account
for this bias.
The method I will present here to handle varying strengths of
opponents is called a statistical filter. Statistical filters
are methods for estimating something using
In this case, the filters can be used to estimate the strength
of a team using a team's game-to-game progression of points scored, points
allowed, whether they were at home or on the road, and who they played
against. There are several methods out there that take only this
information to produce rankings and/or predict scores -- Doug Norris
used to have one but I can't find him on the web anymore,
ESPNet has one, and
World Wide Rankings and Ratings
(WWRR) has four or five. I believe that Doug's method and the ones from WWRR
are "original", meaning that they dreamed them up on their own, a feat
for which I applaud them until my hands turn red.
However, an "optimal" technique for using this information has been around for a long
time. This technique is called a Kalman Filter. Even
though I said the Kalman Filter is "optimal", I am not
claiming a Kalman Filter is any "better" than anything else.
Every paper that has ever been written
about the Kalman Filter has stated that it is "optimal",
so I'm just regurgitating.
Kalman filters are used by NASA to predict the path of missiles and
planes. They are also used to predict weather. They are used on Wall
Street. Recently, they were introduced to environmental law by yours truly.
A Kalman filter is clearly a very practical tool and it only makes sense
that it has applications in basketball. It was actually used in a football
prediction program when it was introduced to me about five years
As another illustration of how one might use the Kalman Filter, I present
the following chicken scratch:
For this strip, I owe a debt of gratitude
to Scott Adams, writer of Dilbert.
No, he didn't draw this, nor did he write
the dialog. Actually he didn't do diddley except get popular enough
so that you could tell what I drew even though I am a lousy artist.
Conceptually, we know that a good offense on average will do relatively
better against a poor defense and relatively worse against a good defense.
Let's start with that concept and attach some numbers. Last year's
Utah Jazz had a good offense on average with an
of 111.7 in a league where the average rating was 105.9. They played
against the following "good defensive teams" a total of 18 times:
Chicago (2), Miami (2), New York (2), Portland (4), San Antonio (4),
and Seattle (4). These teams had a weighted average defensive
rating of 101.5 (weighted by games played against
Utah). The Jazz played against the following "poor defensive
teams" a total of 16 times: Charlotte (2), Dallas (4),
LA Clippers (4), Milwaukee (2), Philadelphia (2), and Toronto (2).
These teams had a weighted average defensive rating of 109.6.
According to my methods, the
Jazz offensive rating should have been about 107 against the
good defensive teams (their rating dropped) and
about 115 against the poor defensive teams. In actuality,
the Jazz ratings were 109.3 and 112.0, respectively. The results aren't as good
as I had hoped, but this sample was small and I have not done an extensive
analysis to determine whether my methods work on larger samples than
those here. I believe they will work, but I'd like to ultimately
check... unless someone else would like to do it (
One of the methods says to predict the Jazz offense vs. Team B defense as
(Jazz Off. Rtg) (B Def. Rtg)
Jazz Off. Rating vs. Tm B Def. = ------------------------------ (1)
League Avg. Rtg.
Try this yourself here:
Technical note: Mathematically, this relationship says that Utah's offensive performance
is linearly related to both the average Utah offense and to the opposing
defense. "Linearly" means that if the average Jazz offense improves by 10%
then the Jazz offense vs. Team B's defense also improves by 10%, not 8% or 50%.
I only introduce this because a Kalman Filter is strictly only "optimal" if this
relationship is linear. The second
method I introduce is not linear. (For people with a statistics background,
note that a Kalman Filter is optimal only if offensive ratings and defensive
ratings are Gaussian distributed. As I showed in
Basketball's Bell Curve, this is also essentially true. The success of
the Correlated Gaussian method substantiates this.)
Here is how the Kalman Filter will work for a game where
team A plays at team B:
(Points can be used in place of ratings above.
I like to use ratings rather than points because they do not fluctuate
as much as points. But, in terms of ease of use, points are
preferable because no calculation of possessions is necessary.
Specifically, we could have used Utah's points per game on the road,
Seattle's points per game at home, the league average of points
per game, and the final score of the game to replace Utah's road
ratings, Seattle's home ratings, the league average rating, and
the final ratings of the game.)
As an example of this entire method, let's return to the four
Bulls games that Jordan missed where they went 1-3 against fairly tough
competition. (Note of 11/16/97: The numbers have been revised below
due to a fix in the variance of the predicted rating.)
The above calculations are duplicated below, allowing you to
change the two somewhat subjective parameters in the procedure: the
variance in the prior estimate of all ratings (which I set to
100) and the variance of the game ratings (which I also set to 100).
The variance of the game ratings is quite consistent with
my records. The variance of the prior estimates states how sure
we are with those estimates; since we are not sure of these estimates
due to Jordan's absence, I set these relatively high. Feel free to vary
these parameters below to see the effects.
Even though not many games were played, we can already get
some idea that the Bulls were not as good without Jordan. The
offense went up slightly, but not enough to be certain about,
and the defense went down quite a bit. For that season,
my numbers had Armstrong's offense being just as efficient as
Jordan's, but his defense being considerably worse, so this Kalman
result is consistent with that. Overall, these few games
indicated that the Bulls' expected winning percentage
went from about 0.762 to about
or a loss of an
games over the course of a season. This seems
small to me based on the difference in talent between
Jordan and Armstrong, but seems about right given the Bulls'
performances in the games he missed, which is the only
information the method uses.
This raises the issue of uncertainty. These four games
cannot present a perfect picture of the difference between
Armstrong and Jordan. Just the noise of basketball --
players getting hurt, teams playing back to back nights,
Dennis Rodman "not being interested" -- prevents us from being sure about
any rating. The Kalman Filter "knows this"
and tells us roughly how sure we should be with the
ratings it gives us. With the parameters above, our final
variances in the offensive and defensive ratings of the Bulls
at home are
These have gone down about
from our prior estimate, so we feel only somewhat
more confident about the estimate than before.
But it gives us a foothold for other comparisons we might
make between Jordan
(Technical remark: I could have estimated the prior Bulls' offensive
and defensive ratings differently, for instance, by just using those games
in which Jordan played. The prior ratings are the 'null hypothesis' we
are testing against, as traditional statisticians phrase it;
our null hypothesis would then have been that not having Jordan made
no difference in the Bulls and we were seeing if we could disprove
This Kalman Filter is a powerful tool for evaluating situations
where strength of opponents is important. This is actually quite
common in basketball, where teams don't play a fully balanced schedule,
some teams certainly playing a more difficult schedule than others,
even over the course of an entire season. I hope to use it quite
a bit, though it is still a little labor intensive for me to implement.
There are a couple weaknesses of the filter that I will mention here
at the end. First, the reason I hadn't really introduced it before
was because I never saw it as a very good predictor when someone like
Jordan was missing from a team. Because the filter looks only at teams,
it cannot account for teams that change, like when a player is injured.
When people put out team ratings, what are they really measuring
if significant players miss a few games? We know that significant
players make a difference in our predictions, but methods like
this don't explicitly account for those players absence or
presence. I took advantage of this "weakness" above by turning
it around and using the method to identify the difference between
the Bulls with Jordan and the Bulls without Jordan.
A second weakness is also a strength. The Kalman Filter's
generality of applicability (to other fields) is great, but it
also implies that it doesn't have built in a lot of the details
of those fields. I had to build a simple model to
games to use in the Kalman Filter. This simple model is not
precisely what happens in basketball; a more complex model may be
more accurate, but then it becomes much more difficult to implement
in a Kalman Filter.
Finally, this method has the weakness that it
says that a team that blows out another
team always improves its overall rating. Unless you
read Can the Bulls Be Perfect?,
you are probably wondering "How is that a weakness?". A
recent finding I made in writing that
article indicated that
a blowout doesn't necessarily make you a better team and can
actually imply that you're not as good. This was a very unusual
result, but one that I cannot dismiss. I also think that it can
be built into the Kalman Filter. The thoughts on how to do that will have to wait
a while because they are technical enough that most people won't want
to hear them. Besides, this article is long enough.
In trying to end on a positive note, I want to mention that
this method holds a key to defensive ratings. Because good defensive
players are often assigned to guard the best players, their
defensive numbers may not look very good unless we take into
account the quality of the players they have to guard.
does something like this in his defensive ratings, but he has indicated to me
that it is a lot of work. Hopefully, this is an easier way to
A reference on the history of the Kalman Filter is
military page. The military does use Kalman Filters
for a lot, so they should know about it.
Another reference for the Kalman Filter is
fairly technical paper by two people from North Carolina.
I found this paper to be very useful to refresh my memory
on this topic. If you know the Kalman Filter well, this paper
is too trivial for you. If you don't know it and are
not technically inclined, this paper is
probably too advanced, but the example is still pretty good.
Most importantly, I owe thanks to Dick Donald for introducing
me to this topic many years ago. Second, I want to thank
Pinder for reviving my need to know this stuff and to one of his students,
Graciela Herrera, for helping me to relearn it quickly.
Finally, I make a second mention of
this University of North Carolina paper, by Welch and
Bishop, who did a good job with it. I hope that
this work adequately reflects these people's abilities to