Basketball's Bell Curve

1 December, 1995

No, not that Bell Curve. Before that book, the bell curve was just another name for a common look seen in statistics. That common look is especially important in basketball, implying numerous things about predicting a team's success and determining its strategy.

In basketball, the number of points scored by a team in a game takes a different value from night to night. For example, if you plot the number of times the road team scored <65 pts in a game, 65-70 pts in a game, 70-75 pts, ..., 140-145 pts in a game in 1994-95, you end up with a distribution that looks like a bell curve, with a peak near the mean of 100 ppg and small tails on either end. This is approximately a Gaussian distribution, as statisticians formally call the bell curve. You can do the same thing for the home team scores from 1994-95 and end up with a similar Gaussian distribution, but its peak is near a mean of about 103 ppg and the distribution is spread out more.

If you have a browser that can handle beta applets (like Netscape 2.0b2 or 2.0b3), you can see these Bell Curves in the plots below (applet produced by Sun):

Knowing these distributions provides us with valuable predictive information. Namely, how much these two distributions overlap indicates something about how frequently the home or road team wins a game. In other words, we can estimate the home court advantage based on the above distributions. This comes about because a win by the home team means only that the home team's score is greater than the road team's score. So, if you pick a random point in the home team's point distribution, what is the chance that it is greater than a random point in the road team's point distribution? The answer gives you an estimate of the home court advantage.

Actually, this is only mostly true. You can believe it if you want to avoid a little statistical complication that I'll describe in this paragraph; most of the implications on strategy and prediction are understandable from just that. The complication that arises in basketball (and all sports, I'm sure) is that the number of points scored by a team (home or road) is correlated to the number of points allowed by that team. In other words, teams play up or down to their competition. Every team does it to some degree and some definitely more than others. This also comes about because of "garbage time", which allows a team to get close without changing the winner of the game. What this means to the analogy in the previous paragraph is that if you pick points in the home and road point distribution with some correlation, you estimate the home court advantage a little better than if you pick the points randomly. This correlation is relatively small (but not insignificant), which is why the previous analogy works.

The result of applying this method to last season's point distributions for the home and road teams is a predicted home court winning percentage of 0.587. The actual home court winning percentage was 0.597. The estimate is off by 1%, which is pretty good.

We know the home court advantage, though. This isn't a prediction, but a confirmation that the method is accurate. The predictive ability comes when applying the method to individual basketball teams, like the Suns. The Phoenix Suns won 59 games in 1994-95, but if you look at their offensive and defensive point distributions, they were estimated to win only 52. Such a large difference means Phoenix got a bit lucky to win those other 7 games. It's normal for teams to have a difference of up to about four or five games, but seven is pretty large. Part of the Suns difference is the fact that they were 4-1 in games decided by two points or less and 8-2 in games decided by three points or less, usually games that involve a bit of luck. What this implies for the 1995-96 season is that the Suns won't win as many as 59 games again. It's rare that teams get that lucky two seasons in a row. Through games of 11/27/95, the Suns are at a winning % of 0.500, well below their winning % of 0.720. The only other team with as large a difference between estimated win% and actual win% last season was the Lakers, which also is doing worse than they were last season.

A very important aspect of this estimation method is that it incorporates the variability in a team's scoring. A team that is inconsistent in how much it scores or how much it allows has a larger standard deviation (to use statspeak) than a team that is consistent. The spread of the bell curve is larger for an inconsistent team than for a consistent team. By spreading out the distributions, you increase the amount of overlap of the points scored and points allowed, which reflects how often the team wins. What this means for good teams is that they win less than they should. What this means for bad teams is that they win more than they should. In other words, being more inconsistent brings a team toward 0.500, toward mediocrity. A consistent team that averages 106 points offensively and 103 points defensively wins more than an inconsistent team with the same offensive and defensive averages. At the extreme, the ultimate consistent team scores 106 points every game and allows 103 points every game and, of course, wins every game. On the other end, a consistent team that averages 103 points offensively and 106 points defensively wins less than an inconsistent team with the same averages.

You can graphically see this point in the interactive plots shown below. The amount of overlap of the Blue curve, representing the distribution of points scored, with the Green curve, representing the distribution of points allowed, gives a graphical view of how much the team wins. Leave one plot at its default values, but play around with the standard deviations in the other plot. You will see that the overlap goes down when you lower the standard deviations, meaning fewer times the defense outscores the offense. (Note that the winning percentages shown are approximate at this point. I am developing the code further to use a better integration method.) There are other ways of affecting the team's winning percentage, of course. Obviously, if you change the means from 102 and 96 to, say, 96 and 102, you will then go from a winning team to a losing team. The second way is to vary the covariance (which is supposed to be in red, but isn't in my browser). Increasing the covariance effectively makes a team more consistent, whether that be in losing or winning. (NOTE: Do not increase the covariance to greater than half the sum of the offensive and defensive variances or the winning % is wrong -- another bug I have to fix.)

Your browser doesn't understand Java Applet tag

Now let's look at the strategy this implies in a game. Using this rationale, an underdog should go for the "inconsistent" or the "high variance" or the "risky" strategy against the favorite. So, if based upon points, the underdog is "expected to lose" 104 to 106 (a 2 point spread), it makes sense for that team to increase its chances by increasing the variance of its offensive and defensive distributions, for instance, by pressing defensively and by shooting three pointers. Even if the underdog doesn't execute these strategies on average as well as they execute their normal offense and defense, it can still pay off. Numerically, if the team's expected number of points scored goes down to 103 and the expected number of points allowed goes up to 107, the team may still actually improve its chances of winning if it increases the variability with which it scores and stops opponents.

This is very important for a coach like Kentucky's Rick Pitino. His game plan of three pointers and pressing defense is a high variance strategy, one that an underdog should take, not a favorite. This high variance strategy is how he got his unknown Providence team to the final 4 in 1986. This is how his Kentucky team came back from a record 33 point deficit a year ago. But continuously applying this high variance strategy on a team with great talent like Kentucky is asking for an upset. Kentucky has been among the favorites to win the NCAA title two out of the past three years, only to fall earlier than expected. Again this year, they were favorites, being preseason #1. But their high variance game plan cost them last night against Massachusetts. And it will likely cost them later on this season. Despite Kentucky's immense talent, coach Rick Pitino's risky game plan makes the team more susceptible to upsets.

There are two other related consequences of this statistical phenomenon. First of all, it helps explain one way a trade can be good (or bad) for both teams involved. If the talent being traded is on average equivalent, but the more consistent talent (e.g., a steady power forward) is going from a bad team to a good team and the more highly variable talent (e.g., a "streaky" three point shooter) is going from a good team to a bad team, the good team gets farther away from 0.500 (gets better) and the bad team gets closer to 0.500 (gets better). Of course, the other aspects of player talents can outweigh this effect, but it is one that should be considered in any trade.

Finally, this statistical result also validates the strategy of going for the win at the end of a tight game when on the road and going for the tie when at home. A three point shot to win is a higher variance strategy than a two point shot to tie. If your chance of winning is less than 50%, based simply upon the fact that you are a road underdog, it makes sense to use a higher variance strategy like the three point shot. In other words, this ad-hoc rule has a firm statistical foundation. Betcha you thought you didn't know any math....