Improving the Strength of Schedule Model, and Week 4 NFL Picks

I promised you all that before Sunday, I’d try to get you the offensive and defensive ratings of the 32 NFL teams, as computed by the strength-of-schedule model I introduced in my last post.

Well, it’s Saturday, I’m making good on that promise. Here they are, in order of strongest overall to weakest overall

Offense/Defense NFL Strength Ratings

Team Offense Defense Total (=Off-Def)
Pittsburgh 9.3454 -12.8488 22.1942
Atlanta 7.998 -9.2423 17.2403
Green Bay 9.3868 -6.5442 15.931
Tennessee 3.4918 -8.0904 11.5822
Chicago 3.6899 -6.6026 10.2925
NY Jets -1.0105 -9.9763 8.9658
Philadelphia 7.0592 -1.5056 8.5648
Miami 5.4238 1.781 3.6428
Baltimore -1.2518 -4.5747 3.3229
Kansas City -3.0222 -6.1722 3.15
Minnesota -9.2154 -12.2434 3.028
Indianapolis 1.3655 -1.3805 2.746
New Orleans 4.4175 1.8985 2.519
Dallas -3.4686 -5.4919 2.0233
New England 13.5998 11.6069 1.9929
Detroit 5.7315 4.7494 0.9821
San Diego 4.8438 5.4278 -0.584
Houston 5.8486 7.742 -1.8934
Cleveland -1.8498 0.4057 -2.2555
Cincinnati -4.6325 -1.8317 -2.8008
Seattle -3.6599 0.0247 -3.6846
Tampa Bay -2.1605 2.139 -4.2995
Buffalo -6.334 -1.2093 -5.1247
St. Louis -6.8654 -1.6255 -5.2399
Washington -2.2807 4.1093 -6.39
Arizona -2.2533 4.7989 -7.0522
Oakland -0.7469 6.4701 -7.217
Denver -1.4403 6.3187 -7.759
NY Giants -1.4845 11.6852 -13.1697
San Francisco -5.6359 9.0157 -14.6516
Jacksonville -10.8193 4.4598 -15.2791
Carolina -14.07 6.7069 -20.7769

A few things to note about these ratings. First the numbers are in points-scored/allowed-above-average-per-game. So for offenses, positive numbers are good. For defenses, negative numbers are good. And when you subtract the defensive points from the offensive points, you get total point differential, or what I’m calling the strength rating.

If you’re observant, you’ll notice something else. The total numbers don’t match the numbers I computed in my last post, when the model didn’t break things down into offense and defense. So what gives?

Well, I discovered a minor coding error in what I did last time. I fixed it, so now when we compute single strength ratings and compare them to the offense/defense totals, we get:

Team Single Output O-D Total (from above)
Pittsburgh 22.0813 22.1942
Atlanta 16.9875 17.2403
Green Bay 15.4567 15.931
Tennessee 11.8251 11.5822
Chicago 10.2158 10.2925
NY Jets 8.8681 8.9658
Philadelphia 8.1996 8.5648
Miami 3.2691 3.6428
Baltimore 2.9968 3.3229
Kansas City 3.2809 3.15
Minnesota 2.9843 3.028
Indianapolis 3.0042 2.746
New Orleans 2.5907 2.519
Dallas 2.0974 2.0233
New England 1.8985 1.9929
Detroit 0.6533 0.9821
San Diego -0.5271 -0.584
Houston -1.4855 -1.8934
Cleveland -2.4303 -2.2555
Cincinnati -3.0073 -2.8008
Seattle -3.3511 -3.6846
Tampa Bay -4.1288 -4.2995
Buffalo -5.6052 -5.1247
St. Louis -5.0459 -5.2399
Washington -5.998 -6.39
Arizona -7.2695 -7.0522
Oakland -7.3101 -7.217
Denver -7.3261 -7.759
NY Giants -12.736 -13.1697
San Francisco -14.6399 -14.6516
Jacksonville -15.0712 -15.2791
Carolina -20.4773 -20.7769

The numbers very nearly match, which they should. The difference can be attributed to randomness in the optimization routine used to the solve the system of equations, and its effect on bets will be negligible. (In other words, we’d never bet a game where the line was close enough to ours that this small difference affected the side we took.)

Using the Model to Pick Games

So I’ll show you how we’d use these strength ratings to pick a few games.  First, notice that when we combine offense and defense ratings into a single number, we’re throwing away some good information, since we could have gotten that single number without breaking it down.  For now, that’s ok, but in the future, we’ll want to use that extra information to possibly say something about matchups.

Maybe, for example, when a bad offense plays against a bad defense, they often score a lot of points, but when a great offense plays a great defense, they don’t score much, in general.  This type of information would certainly be useful for choosing bets, but which we can’t do that without significant backtesting.  So for now, we just ignore the offense/defense breakdown.  But even just using the single strength rating, there’s still one more thing we need to do before we can really interpret these as meaningful values.

A 19-Point Favorite?  We Need to Fix That…

Notice how large some of the values are. For example, Pittsburgh’s rating is a whopping 22 points per game, while their opponent Baltimore’s is a more reasonable 3.

Interpreting the ratings as points per game, as is natural, this tells us the Steelers should beat the Ravens by a 19 points, before we even consider that their home field advantage.  19 POINTS??  They’re only a 2 point favorite, at home!

So what’s the problem?  Well, remember how last time I mentioned that we haven’t yet accounted for reversion to the mean?  In general, teams that look really good right now aren’t going to be this good all year, and teams that look really bad aren’t going to be quite this bad all year.  As the data piles up, teams will all tend to look more average than they do now, as a rule.

Let’s see what we can do to quantify this and make an adjustment.  Check out this histogram of NFL teams’ strength ratings going into Week 4, as computed by our model, from 2002-2007:

You can see that most of the teams’ ratings fall between -10 and 10, but with some in the -30 to -40 range on the low end, and the 20- to 30 range on the high end.  Compare this to the same graph, but of teams’ ratings heading into Week 16:

Ahh, a much tighter distribution!  There’s just one amazing outlier, a team that managed to be 20 points better than most of the league for the entire season. Three guesses as to which model-dating, prettyboy quarterback engineered that one.  (Of course, we all know their coach was cheating.)

Jealousy aside, we need to account for this somehow.  We do it by looking at variance, a measure of how widely spread around the mean a set of datapoints is.  What we find is this: The variance of the Week 4 ratings is very large, about 148.3, while the variance of the Week 16 ratings is only about 41.9.

This means that in order to compare teams in Week 4, we need to shrink the variance of our Week 4 ratings by a factor of $$148.3/41.9\approx 3.5$$.

In this case, since the mean is zero (remember, we required that of values when we solved for the solution), the variance $$\sigma^2$$ can be expressed $$\sigma^2 = E(X^2).$$ Since we want to shrink the variance by 3.5, this means we need to divide the X’s (our Week 4 strength ratings) by a factor of $$\sqrt{3.5}$$ in order to make the spread of our team abilities match what we know it should look like after lots of games have revealed teams’ true abilities.  These “variance-adjusted strengths” appear below.

Reduced-Variance NFL Strength Ratings

Team Reduced-Variance Strength
Pittsburgh 11.7381
Atlanta 9.0303
Green Bay 8.2165
Tennessee 6.2860
Chicago 5.4306
NY Jets 4.7141
Philadelphia 4.3588
Miami 1.7378
Baltimore 1.5930
Kansas City 1.7441
Minnesota 1.5864
Indianapolis 1.5970
New Orleans 1.3772
Dallas 1.1149
New England 1.0092
Detroit 0.3473
San Diego -0.2802
Houston -0.7897
Cleveland -1.2919
Cincinnati -1.5986
Seattle -1.7814
Tampa Bay -2.1948
Buffalo -2.9796
St. Louis -2.6823
Washington -3.1884
Arizona -3.8643
Oakland -3.8859
Denver -3.8944
NY Giants -6.7702
San Francisco -7.7823
Jacksonville -8.0116
Carolina -10.8854

It’s these numbers that we should use to make our picks. I’ll emphasize again that this is a simplified version of the model, so I wouldn’t advise making any bets with it yet (unless strictly for fun).  We’re not even accounting for any injuries yet!

But for the fun of it, here’s what we’d do if we were betting.

The System’s Week 4 NFL Picks

Look at the Baltimore-Pittsburgh game.  The Steelers are laying 2 points.  We have the Ravens at 1.59; the Steelers are the best team in the league at 11.73, so the difference is $$11.73-1.59=10.14.$$ Add to that 2.5 for the Steelers homefield advantage, and they should be a 12.5-point favorite.  (Yeah, I know that’s huge.  That happens early in the year, because the model uses no preconceived notions about the team abilities, only what’s in the data.)

So in this case, much as I hate to say it, the model would take the Steelers.  I’ll do one more, then you’re on your own:

  • Denver +7 at Tennessee. Denver = -3.89, Tennessee = 6.28.  So the difference is greater than 10, in favor of Tennessee, plus home field for them.  Way more than 7, so we take Tennessee, laying the points.

Get the idea?

Continuing in this way, the model would pick:

  • Pittsburgh minus 2
  • Denver plus 7 [This is a typo. According to the above, the pick should have been Tennessee minus 7 (which is a loser).]
  • Cleveland plus 3
  • Atlanta minus 7
  • Philadelphia minus 6
  • Chicago plus 3.5
  • Miami plus 1

The bold are the most favorable bets, according to the model.  The other games listed are still good, and for games not listed, the model’s prediction was too close to the actual line to make a bet.  (For the record, JMS, it likes your Lions getting 14, but just barely.)

In the future, I’d like to find a better way—possibly using pattern-recognition or other machine-learning software—to recognize favorable bets, given the offense/defense ratings, week number, and spread.  That’s something I didn’t get anywhere with when I tried it before, but I’m hoping a few bright minds out there will have some suggestions.

So there you go.  Week 4 NFL picks, from our very (right now) basic model.  I’m planning to introducing a little more each week, along with an additional post every week about another topic, just to lighten things up.  Sound good?

If you’re on board for that, subscribe to Thinking Bettor to make sure you never miss a post.  Come on, it’s free!

2 Responses to Improving the Strength of Schedule Model, and Week 4 NFL Picks
  1. Chad Kettner
    October 4, 2010 | 11:02 am

    And now the post that every bettor hates…

    … any. given. sunday.

    What a bazaar week!

    • Matt
      October 4, 2010 | 11:37 am

      Yeah, not a great start for the model, was it? These picks weren’t from the full version of it and were more to illustrate how it works, but I think that even the full version would have been well under .500, because this basic model is still at the heart of it.

      The injuries to 2 QB’s on teams the model picked got me thinking about injuries though. I have a decent method of accounting for them that’s a pain in the ass to actually implement, but what I’m wondering is if picks should be adjusted because of the possibility of injury.

      For example, if bad conditions are likely, the spread becomes less because anything can happen. This isn’t quite that, but how about this argument: If each team has equal chance of sustaining a big injury during a game, does that favor the underdog for the same reason?

      Or, another way of thinking about it: A big favorite likely has more stars on the team. If one gets hurt, they lose a lot by having to go to the bench. The big dogs will have smaller difference between starters and bench players, so an injury will probably not hurt them as much.

      Definitely something to think about more…this is the type of thing that machine-learning tools are good at figuring out and accounting for. Which is why that version of the model is better.