Exploring Wins with nflfastR

Looking at what metrics are important for predicting wins. Creating expected season win totals and comparing to reality.

Austin Ryan https://twitter.com/packeRanalytics

Table of Contents

What can two decades worth of play-by-play data and some math tell us about what wins games in the NFL? Let’s look at some simple linear regressions using metrics we can easily compute with nflfastR data.

Please note code chunks have been intentionally hidden in this post for readability. See the rmd file at https://github.com/mrcaseb/open-source-football/ if you would like to see the underlying code.

Simple Linear Regression

We can see passing efficiency metrics have the strongest relationships with wins. Furthermore, offensive passing efficiency metrics have stronger relationships than defensive passing metrics do.

A team’s expected points added per dropback explains nearly half of the variation in their season win total. Whereas defensive expected points added per dropback explains about 32% of the variation in wins. Offensive and defensive rushing efficiency metrics only explain about 18 and 9% of the variation in wins respectively.

Random Forest Variable Importance

We can also build a random forest model and let the model tell us what features yield the most information gain. Again, passing efficiency is the largest driver of wins and it is not particularly close.

Multiple Linear Regression

We know offensive and defensive EPA per dropback metrics are useful for explaining season win totals. Just for fun make a linear regression model that uses EPA per dropback and per rush for both sides of the ball. This regression explains 78% of the variation in season wins.

We can use the regression formula to develop expected wins based on EPA per play metrics. The distribution of actual wins minus expected wins is normally distributed with a mean of 0 and a standard deviation of 1.4 wins.

This means 68% of the season win totals from 2009-2019 are plus or minus 1.4 wins from what our expected wins formula predicts. Furthermore, 95% of the season win totals are within 3 games of what we would predict. Put another way, it is rare for a team to out or underperform their expected wins by more than 3 games.

How did expected and actual wins look in 2019?

Based on our expected wins formula the NFC North champs were predicted to have 10 wins while they actual won 13. Additionally, the team they beat to get to the NFC Championship looked more like an 8 win team rather than an 11 win team. On the other end of the spectrum the Cowboys produced EPA per play metrics that predicted an 11 win team, however, they ended up 3 wins short.

What does this mean for the 2020 season?

Looking at the 25 teams in the right tail (those who over performed by more than 2.5 wins) from 1999 to 2018 we find that on average their wins dropped by 2.3 games in the next season. Not great news for Packers or Seahawks fans in 2020.

The 29 teams n the left tail we see that teams who under performed by more than 2.5 wins increased their wins by 2.7 games the next season. The 2019 Cowboys, Chargers, and Buccaneers also fall into this tail.

If we look at teams who over performed by more than 2 games (56 from 1999 to 2018) we see their wins drop on average by 2.6 games the next season. Conversely, teams who under perform by more than 2 games (50 from 1999 to 2018) increase their wins the next season by 2.6 games on average.

Other Findings

The difference between actual and expected wins is largely a function of how a team performs in one score games and on special teams performance. Record in one score games isn’t very stable year over year for the most part, however, a few teams did consistently out or over perform their expected wins.

Of the 669 season long performances in the data only 38 teams under performed by more than 2.35 wins. The Chargers account for over a fifth of those seasons.

The Browns have not over performed since 2009 when they won 5 games but this model saw them as more of a 2 win team.

On the other end of the spectrum the Patriots have only under performed by more than half a game two times.


If you see mistakes or want to suggest changes, please create an issue on the source repository.


Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/mrcaseb/open-source-football, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".


For attribution, please cite this work as

Ryan (2020, Aug. 23). Open Source Football: Exploring Wins with nflfastR. Retrieved from https://mrcaseb.github.io/open-source-football/posts/2020-08-23-exploring-wins-with-nflfastr/

BibTeX citation

  author = {Ryan, Austin},
  title = {Open Source Football: Exploring Wins with nflfastR},
  url = {https://mrcaseb.github.io/open-source-football/posts/2020-08-23-exploring-wins-with-nflfastr/},
  year = {2020}