A Basic Model to Account for Strength of Schedule

In this post I’ll introduce the basic framework of the model I’ve used with some success to bet NFL and college basketball games. There’s plenty of room to make it stronger, and my hope is that the readers of this site will contribute with ideas and man-hours to help strengthen it.

Since the model came about as a way of computing strength-of-schedule, it’s particularly good at comparing two teams who have few or no common opponents, where humans have difficulty. For this reason, it has a lot of potential for betting on college football, tennis, and international soccer, where these types of matches arise frequently and make it hard for handicappers to accurately compare the teams playing.

A note on the math: I’m including a few equations here, but nothing fancy. I’ll try to explain in words any equations I write, so if it feels too much like school, you can probably ignore the math and still get the gist. I plan on setting up static pages with more detailed mathematics so that you can refer to them if you decide you want to go deeper with the nerdy stuff.

The Basic Difficulty in Computing Strength-of-Schedule

For the sake of familiarity, let’s assume we’re talking about the NFL. You’re considering a bet on the Houston Texans this weekend, Week 4.

We’d like to quantify just how good the Texans are and to do the same for their opponent, the Raiders, to decide on the likely outcome of the game.

So we look at who the Texans have played and how they did. We’ve seen the Texans beat the Colts and Redskins this year, by 10 points and 3 points, but they lost to the Cowboys, by 14. Does this all make them good or bad? We need to know more.

So what do we do next? Of course, we have to look at how good their previous opponents, the Colts, Redskins, and Cowboys are.

But almost immediately, we run into the problem.

The Problem: If we try to evaluate these three teams the same way, by looking at who they’ve played and their margins of victory or defeat, we’re stuck. They’ve each played Texans, the team we’re trying to evaluate in the first place! So we have no obvious way of determining how good they are, leaving us with nothing to say about how good the Texans are.

We could look at the other games these teams have played, but the same problem arises in trying to evaluate their opponents. As we get further into the season the web of games played becomes more complicated, and the problem becomes overwhelming.

A Best-Fit Solution

It turns out that with a little creativity and a computer, this problem can be solved. We’ve seen that looking at each team one at a time, won’t work, so we have to set the problem up as a system of equations and write a program to solve it. Let’s look at the most basic way to do this.

Recall that for each game a team has played, we’re using only two pieces of information: the strength of the opponent and the margin of victory or defeat. (We’ll need to eventually incorporate more information, but this simplicity is what allows the model to be used for almost any sport.)

So how should we use that information for each game? In the simplest setup, we can simply add the two numbers together. Let $$x_i $$ be the strength of team $$i$$, and $$m_{i,j}$$ be the margin of team $$i$$’s game against team $$j$$. (Right now, we’re assuming nobody has played anyone twice yet.) So for Houston’s win over Indy, the contribution to Houston’s strength rating looks like

$$m_{hou,ind} + x_{ind} = 10 + x_{ind}$$.

Notice that the 10 is positive, since Houston won the game, and notice also that the larger the opponent’s strength rating and margin of victory, the larger the sum will be.

Now let’s say that a team’s strength rating will be the average of these terms for each game it has played. Then in Houston’s case,

$$x_{hou} = \frac{1}{3}\big((m_{hou,ind} + x_{ind}) + (m_{hou,was} + x_{was}) + (m_{hou,dal} + x_{dal})\big)$$ or

$$x_{hou} = \frac{1}{3}\big((10+x_{ind})+(3+x_{was})+(-13+x_{dal})\big)$$.

Set up an equation like this for each team, and we’re left with a system of 32 equations in 32 unknowns. This system won’t always have an exact solution, but we can use an optimization routine to find the set of team strength ratings that best fits the data. (That is, the total error in the system is minimized.)

We need to make one final assumption in order to get values that have any meaning to us. Notice that the units in the margin of victory numbers $$m_{i,j}$$ are simply “points.” To be consistent, we want the units for the strength ratings to be “points” as well. So if we require that the $$x_i$$’s sum to 0, we can interpret them as the amount of points per game by which each team is better or worse than average, after accounting for strength of schedule. Make sense?

The Strength Ratings

Finally, we’re in a position to calculate the team strengths. But first, please note: You should NOT bet based on these numbers! The actual model I use for betting is far more complicated than this (you didn’t think beating the sportsbooks was that easy, did you?). I’ve calculated these numbers using only the simple model I’ve explained above, in addition to a small homefield adjustment to each margin of victory.

Team	Strength (in points)
Arizona	-5.8292
Atlanta	10.7724
Baltimore	2.1843
Buffalo	-6.6633
Carolina	-13.4239
Chicago	5.2064
Cincinnati	-0.9725
Cleveland	-1.4321
Dallas	1.143
Denver	-2.9387
Detroit	-2.1692
Green Bay	9.2993
Houston	-0.3437
Indianapolis	4.7844
Jacksonville	-10.3225
Kansas City	4.4525
Miami	0.7156
Minnesota	0.4497
New England	0.8388
New Orleans	0.8683
NY Giants	-8.1594
NY Jets	5.5405
Oakland	-5.2587
Philadelphia	5.0974
Pittsburgh	14.5176
San Diego	1.9832
San Francisco	-10.1833
Seattle	0.3264
St. Louis	-2.3638
Tampa Bay	-2.9795
Tennessee	8.6363
Washington	-3.7762

At a glance, we see that Pittsburgh is the best (even without Roethlisberger, a scary thought), and that Carolina is the worst. Sounds about right.

Now, if you were to bet based on these numbers, here’s how you’d do it. (Like I said, this wouldn’t help you much. But because the market is relatively efficient, you wouldn’t be any worse off this way than you’d be by betting randomly.)

Let’s say you actually wanted to bet that Houston-Oakland game. Look at the chart and see that Houston is .34 points worse than average. Oakland is about 5.25 points worse than average. Subtracting, we find that Houston is just less than 5 points better than Oakland. But since Oakland is home, add 2.5 points to the difference, and our model predicts that Houston should win by 2-3 points. Since the spread has Houston laying 4 points, we’d take Oakland with the points. Simple, right? (Actually, with the line that close to our projected outcome, this probably wouldn’t be a bet at all.)

But again, this is an overly-simplified model. We’re failing to account for a lot of stuff here, including mean reversion, any injuries that make the past data a poor representation of the teams that will be on the field on Sunday, and more.

Here are a few ideas I have for improving the model, some of which I’ve already implemented and others which I haven’t. I’m listing them here in hopes that they’ll get the wheels turning in your head to come up with more ideas about how we might make this better.

Using a “forgetting factor” to weight recent games more strongly than games from several weeks ago
Using thresholding or tapering to minimize the impact of blowouts
Our additive model for a team’s strength can be interpreted as “average margin of victory plus average opponent strength.” A multiplicative model would better capture interaction between teams
Rather than using margin of victory, use yards, turnovers, and other data that is less noisy (luck impacts the final score more than the final yard and turnover totals). Convert this back to points after it is output from the model, using some form of regression, to make bets
Find a way to efficiently account for injuries by determining how many points an injured or returning player is worth
Use a pattern-recognition algorithm to determine what a favorable bet “looks like” in terms of the spread and team strength ratings
Filter data through a neural network or other filter to determine best way to combine team strengths and current week number to determine expected outcome. Use a model which produces offensive and defensive ratings to capture interactions and use current week number to account for mean reversion

Offense/Defense Model Ratings

The final point above mentions an offensive/defensive model, one which uses points scored and points allowed rather than simply margin of victory. In fact, it’s very easy to expand the basic model in this way, with the nice result that for each team, the offensive and defensive ratings sum to the total ratings from above.

There’s not much we can do with this new information, since the model is still additive (once we add them, we get the same implied bets as in the simpler model). Still, this is the model we want to build upon, so I’ll do my best to run this version of the system and publish the results by this weekend.

Alright, that’s plenty for today. This math-on-a-blog stuff is brand new to me, so I’m to figure out the right balance between what’s interesting and necessary and what will put you to sleep. So let me know what you think. We could certainly go much deeper into the math with some matrix theory, even for the simple model in this post, but I’m doing my best right now to keep my inner nerd at bay.

I’m going to try to start posting more frequently here, maybe two or three times a week. So check back soon and subscribe to get post updates automatically.

Chad Kettner

September 29, 2010 | 11:53 am

For lazy betting, what are your thoughts on ESPNs Accuscore Simulator?

Matt
September 30, 2010 | 2:56 pm

Chad, I honestly don’t know much about it. Do they publish its record against the spread or any other data about how it performs?

My biggest problem with simulators is the sheer number of parameters and assumptions they require. In a game like baseball, where players act largely independently and the possible states of the game are relatively few, I think you can make it work. For something like football, where you need not just to model individual players but their interactions in an infinite number of game states, things get out of hand rather quickly.

A model is only as good as its assumptions, and simulators require a lot of them. And with so many parameters, the Curse of Dimensionality makes it very hard to find the optimal values for them and can also lead to overfitting of the model to past data.

But like I said, I’d love to know how their record. If they’re winning against the spread consistently, then everything I just wrote doesn’t matter!

JMS

September 29, 2010 | 9:34 pm

What about adding other statistics, particularly some of the important ones, such as turnovers (takeaways and giveaways) or 3rd down conversion percentage for defense and offense. Perhaps you could do a rolling 3 game average to take into account teams on the upswing or downswing. I’m certainly not well versed in the math at all, but I found it interesting.

Great stuff!

Matt
September 30, 2010 | 3:03 pm

JMS, good ideas. The second (the rolling average) is something I have already implemented in the current version of the model (but not in the numbers generated in this post). Basically it weights the data from Week x by a factor lambda^(w-x), where lambda is between 0 and 1, and w is the current week of the season. So you can see that when x is 1 or 2 or something small and the current week number is much larger, then the exponent is large, making the total weight given to that data very small. From the testing I’ve done on it, it seems lambda = 0.95 or so is the best value to use.

Your first idea (to use important stats instead of simply score) is one I haven’t tried with this model. But it could be done, and I think replacing score entirely with a few stats might improve the model. We would essentially be determining teams’ abilities to produce those stats, rather than points, which I suspect would be a better way to do it with less noise. Definitely something I’ll explore.

Thanks!

October 1, 2010 | 12:48 pm

How many stats would you include? Or what would be the optimal number of stats to include? I know coaches and media talk a lot about different stats, but many insiders really talk about turnovers and 3rd down efficiency as big ones. How would you go about identifying stats to include? I’d certainly be willing to help any way that I can…
JMS recently posted..Unwinding Your Life

Matt
October 1, 2010 | 7:43 pm

You definitely want to keep the number as low as possible for most purposes, especially in the machine-learning approaches that I know the most about (I’ll write more about these in the future). But there are all kinds of basic statistical tests for significance of variables, so the usual way is to start with a big group of potential variables to include, then test on a dataset to see which have predictive power. I have a pretty good dataset of scores and Vegas lines, but it doesn’t include stats. But those should be too hard to compile, especially if multiple people are working on it or someone who knows how to pull data automatically from a website and parse it up.

The one thing to keep in mind is that if you build a basic model using the same stats and tools (and combining them in the same way) as everyone else, you won’t get anywhere. The odds and lines probably already reflect that basic level of sophistication, except probably for rare situations where the public goes crazy and moves the line away from where it should be. So in my opinion, the trick is to still use the most meaningful stats, but to do so in a way that’s unique. Or to use stats or combinations of stats and other information that others wouldn’t think to try. The reason I have any confidence at all in the stuff I’ve done is that a lot of it is based on the machine-learning tools I mentioned before, which not too many people understand well.

Brett

March 11, 2011 | 5:29 am

Hi Matt,
What is the meaning of “\big” in the strength of schedule formula you’ve listed?

I’m trying to recreate this formula myself, and am having problems deciphering what this means.

I’m also assuming “frac” is the fraction of 1/3. The reason being round 4 strength ratings only account for the first three games. Hence the “3”. Is this correct?

Regards,
Brett

Matt
March 22, 2011 | 11:23 am

Hi Brett,

The “\big” and “\frac” are LaTeX code; they’re supposed to show up on the site as math symbols (parenthesis and fraction, respectively), so something isn’t working right. 🙁 But you can probably interpret the formulas even so.

So yes, you’re right about the fraction. And that the 3 is there because it’s an average of 3 games.

Let me know what other trouble you’re having. I’m really interested in getting back to this site and posting more content soon!