Improving the Strength of Schedule Model, and Week 4 NFL Picks

I promised you all that before Sunday, I’d try to get you the offensive and defensive ratings of the 32 NFL teams, as computed by the strength-of-schedule model I introduced in my last post.

Well, it’s Saturday, I’m making good on that promise. Here they are, in order of strongest overall to weakest overall

Offense/Defense NFL Strength Ratings

 Team Offense Defense Total (=Off-Def) Pittsburgh 9.3454 -12.8488 22.1942 Atlanta 7.998 -9.2423 17.2403 Green Bay 9.3868 -6.5442 15.931 Tennessee 3.4918 -8.0904 11.5822 Chicago 3.6899 -6.6026 10.2925 NY Jets -1.0105 -9.9763 8.9658 Philadelphia 7.0592 -1.5056 8.5648 Miami 5.4238 1.781 3.6428 Baltimore -1.2518 -4.5747 3.3229 Kansas City -3.0222 -6.1722 3.15 Minnesota -9.2154 -12.2434 3.028 Indianapolis 1.3655 -1.3805 2.746 New Orleans 4.4175 1.8985 2.519 Dallas -3.4686 -5.4919 2.0233 New England 13.5998 11.6069 1.9929 Detroit 5.7315 4.7494 0.9821 San Diego 4.8438 5.4278 -0.584 Houston 5.8486 7.742 -1.8934 Cleveland -1.8498 0.4057 -2.2555 Cincinnati -4.6325 -1.8317 -2.8008 Seattle -3.6599 0.0247 -3.6846 Tampa Bay -2.1605 2.139 -4.2995 Buffalo -6.334 -1.2093 -5.1247 St. Louis -6.8654 -1.6255 -5.2399 Washington -2.2807 4.1093 -6.39 Arizona -2.2533 4.7989 -7.0522 Oakland -0.7469 6.4701 -7.217 Denver -1.4403 6.3187 -7.759 NY Giants -1.4845 11.6852 -13.1697 San Francisco -5.6359 9.0157 -14.6516 Jacksonville -10.8193 4.4598 -15.2791 Carolina -14.07 6.7069 -20.7769

A few things to note about these ratings. First the numbers are in points-scored/allowed-above-average-per-game. So for offenses, positive numbers are good. For defenses, negative numbers are good. And when you subtract the defensive points from the offensive points, you get total point differential, or what I’m calling the strength rating.

If you’re observant, you’ll notice something else. The total numbers don’t match the numbers I computed in my last post, when the model didn’t break things down into offense and defense. So what gives?

Well, I discovered a minor coding error in what I did last time. I fixed it, so now when we compute single strength ratings and compare them to the offense/defense totals, we get:

 Team Single Output O-D Total (from above) Pittsburgh 22.0813 22.1942 Atlanta 16.9875 17.2403 Green Bay 15.4567 15.931 Tennessee 11.8251 11.5822 Chicago 10.2158 10.2925 NY Jets 8.8681 8.9658 Philadelphia 8.1996 8.5648 Miami 3.2691 3.6428 Baltimore 2.9968 3.3229 Kansas City 3.2809 3.15 Minnesota 2.9843 3.028 Indianapolis 3.0042 2.746 New Orleans 2.5907 2.519 Dallas 2.0974 2.0233 New England 1.8985 1.9929 Detroit 0.6533 0.9821 San Diego -0.5271 -0.584 Houston -1.4855 -1.8934 Cleveland -2.4303 -2.2555 Cincinnati -3.0073 -2.8008 Seattle -3.3511 -3.6846 Tampa Bay -4.1288 -4.2995 Buffalo -5.6052 -5.1247 St. Louis -5.0459 -5.2399 Washington -5.998 -6.39 Arizona -7.2695 -7.0522 Oakland -7.3101 -7.217 Denver -7.3261 -7.759 NY Giants -12.736 -13.1697 San Francisco -14.6399 -14.6516 Jacksonville -15.0712 -15.2791 Carolina -20.4773 -20.7769

The numbers very nearly match, which they should. The difference can be attributed to randomness in the optimization routine used to the solve the system of equations, and its effect on bets will be negligible. (In other words, we’d never bet a game where the line was close enough to ours that this small difference affected the side we took.)

Using the Model to Pick Games

So I’ll show you how we’d use these strength ratings to pick a few games.  First, notice that when we combine offense and defense ratings into a single number, we’re throwing away some good information, since we could have gotten that single number without breaking it down.  For now, that’s ok, but in the future, we’ll want to use that extra information to possibly say something about matchups.

Maybe, for example, when a bad offense plays against a bad defense, they often score a lot of points, but when a great offense plays a great defense, they don’t score much, in general.  This type of information would certainly be useful for choosing bets, but which we can’t do that without significant backtesting.  So for now, we just ignore the offense/defense breakdown.  But even just using the single strength rating, there’s still one more thing we need to do before we can really interpret these as meaningful values.

A 19-Point Favorite?  We Need to Fix That…

Notice how large some of the values are. For example, Pittsburgh’s rating is a whopping 22 points per game, while their opponent Baltimore’s is a more reasonable 3.

Interpreting the ratings as points per game, as is natural, this tells us the Steelers should beat the Ravens by a 19 points, before we even consider that their home field advantage.  19 POINTS??  They’re only a 2 point favorite, at home!

So what’s the problem?  Well, remember how last time I mentioned that we haven’t yet accounted for reversion to the mean?  In general, teams that look really good right now aren’t going to be this good all year, and teams that look really bad aren’t going to be quite this bad all year.  As the data piles up, teams will all tend to look more average than they do now, as a rule.

Let’s see what we can do to quantify this and make an adjustment.  Check out this histogram of NFL teams’ strength ratings going into Week 4, as computed by our model, from 2002-2007:

You can see that most of the teams’ ratings fall between -10 and 10, but with some in the -30 to -40 range on the low end, and the 20- to 30 range on the high end.  Compare this to the same graph, but of teams’ ratings heading into Week 16:

Ahh, a much tighter distribution!  There’s just one amazing outlier, a team that managed to be 20 points better than most of the league for the entire season. Three guesses as to which model-dating, prettyboy quarterback engineered that one.  (Of course, we all know their coach was cheating.)

Jealousy aside, we need to account for this somehow.  We do it by looking at variance, a measure of how widely spread around the mean a set of datapoints is.  What we find is this: The variance of the Week 4 ratings is very large, about 148.3, while the variance of the Week 16 ratings is only about 41.9.

This means that in order to compare teams in Week 4, we need to shrink the variance of our Week 4 ratings by a factor of $$148.3/41.9\approx 3.5$$.

In this case, since the mean is zero (remember, we required that of values when we solved for the solution), the variance $$\sigma^2$$ can be expressed $$\sigma^2 = E(X^2).$$ Since we want to shrink the variance by 3.5, this means we need to divide the X’s (our Week 4 strength ratings) by a factor of $$\sqrt{3.5}$$ in order to make the spread of our team abilities match what we know it should look like after lots of games have revealed teams’ true abilities.  These “variance-adjusted strengths” appear below.

Reduced-Variance NFL Strength Ratings

 Team Reduced-Variance Strength Pittsburgh 11.7381 Atlanta 9.0303 Green Bay 8.2165 Tennessee 6.2860 Chicago 5.4306 NY Jets 4.7141 Philadelphia 4.3588 Miami 1.7378 Baltimore 1.5930 Kansas City 1.7441 Minnesota 1.5864 Indianapolis 1.5970 New Orleans 1.3772 Dallas 1.1149 New England 1.0092 Detroit 0.3473 San Diego -0.2802 Houston -0.7897 Cleveland -1.2919 Cincinnati -1.5986 Seattle -1.7814 Tampa Bay -2.1948 Buffalo -2.9796 St. Louis -2.6823 Washington -3.1884 Arizona -3.8643 Oakland -3.8859 Denver -3.8944 NY Giants -6.7702 San Francisco -7.7823 Jacksonville -8.0116 Carolina -10.8854

It’s these numbers that we should use to make our picks. I’ll emphasize again that this is a simplified version of the model, so I wouldn’t advise making any bets with it yet (unless strictly for fun).  We’re not even accounting for any injuries yet!

But for the fun of it, here’s what we’d do if we were betting.

The System’s Week 4 NFL Picks

Look at the Baltimore-Pittsburgh game.  The Steelers are laying 2 points.  We have the Ravens at 1.59; the Steelers are the best team in the league at 11.73, so the difference is $$11.73-1.59=10.14.$$ Add to that 2.5 for the Steelers homefield advantage, and they should be a 12.5-point favorite.  (Yeah, I know that’s huge.  That happens early in the year, because the model uses no preconceived notions about the team abilities, only what’s in the data.)

So in this case, much as I hate to say it, the model would take the Steelers.  I’ll do one more, then you’re on your own:

• Denver +7 at Tennessee. Denver = -3.89, Tennessee = 6.28.  So the difference is greater than 10, in favor of Tennessee, plus home field for them.  Way more than 7, so we take Tennessee, laying the points.

Get the idea?

Continuing in this way, the model would pick:

• Pittsburgh minus 2
• Denver plus 7 [This is a typo. According to the above, the pick should have been Tennessee minus 7 (which is a loser).]
• Cleveland plus 3
• Atlanta minus 7
• Chicago plus 3.5
• Miami plus 1

The bold are the most favorable bets, according to the model.  The other games listed are still good, and for games not listed, the model’s prediction was too close to the actual line to make a bet.  (For the record, JMS, it likes your Lions getting 14, but just barely.)

In the future, I’d like to find a better way—possibly using pattern-recognition or other machine-learning software—to recognize favorable bets, given the offense/defense ratings, week number, and spread.  That’s something I didn’t get anywhere with when I tried it before, but I’m hoping a few bright minds out there will have some suggestions.

So there you go.  Week 4 NFL picks, from our very (right now) basic model.  I’m planning to introducing a little more each week, along with an additional post every week about another topic, just to lighten things up.  Sound good?

If you’re on board for that, subscribe to Thinking Bettor to make sure you never miss a post.  Come on, it’s free!

2 Responses to Improving the Strength of Schedule Model, and Week 4 NFL Picks
October 4, 2010 | 11:02 am

And now the post that every bettor hates…

… any. given. sunday.

What a bazaar week!

• Matt
October 4, 2010 | 11:37 am

Yeah, not a great start for the model, was it? These picks weren’t from the full version of it and were more to illustrate how it works, but I think that even the full version would have been well under .500, because this basic model is still at the heart of it.

The injuries to 2 QB’s on teams the model picked got me thinking about injuries though. I have a decent method of accounting for them that’s a pain in the ass to actually implement, but what I’m wondering is if picks should be adjusted because of the possibility of injury.

For example, if bad conditions are likely, the spread becomes less because anything can happen. This isn’t quite that, but how about this argument: If each team has equal chance of sustaining a big injury during a game, does that favor the underdog for the same reason?

Or, another way of thinking about it: A big favorite likely has more stars on the team. If one gets hurt, they lose a lot by having to go to the bench. The big dogs will have smaller difference between starters and bench players, so an injury will probably not hurt them as much.

Definitely something to think about more…this is the type of thing that machine-learning tools are good at figuring out and accounting for. Which is why that version of the model is better.