Getting into sports analytics

Collection of short answers to common questions.

Ben Baldwin https://twitter.com/benbbaldwin
08-24-2020

Table of Contents


Introduction

I get a lot of emails asking about how to start learning things. Rather than being useful to one person, I’m going to start collecting the questions (anonymized) and answers here.

Learning R

Q: How can I get started learning R?

Should I learn R or python?

Short answer: it doesn’t matter, just pick one and get good at it.

Longer answer: More people in the football public analytics community use R, so there are more resources for getting up and running faster. In addition, if there is one thing that R is better at python at, it is cleaning and manipulating data, so if all you care about is working with data, R might be a better choice to start. At the same time, python is also great, and in the long run if you end up doing data analysis for a career, you’re probably going to end up learning both at some point anyway. And if you start doing machine learning (ML) stuff (see next question), python generally has more packages and tools available, and most ML courses are taught using python.

Big Data Bowl / Machine Learning

Q: I know about the NFL big data bowl and that the papers of many finalists are available on the internet. But I believe that I lack the skills needed to understand those papers and use them to answer my questions. What statistics and machine learning resources do you recommend I use to learn the necessary machine learning that can be applied to football, when appropriate, to answer my questions?

Unfortunately, economics doesn’t give much training in machine learning. I too couldn’t really understand the Big Data Bowl stuff until I took this course (all the videos and homeworks are free and posted online), which was pretty challenging but gives a great foundation for what ML means and how to think about it.

This course is very similar to the famous Stanford 231n course – the instructor used to teach 231n – but the homeworks are in Colab so there’s much less setup involved with getting python up and running. I did the first four assignments and finally could understand the Big Data Bowl winning solution.

I also highly recommend An Introduction to Statistical Learning, and I have been recommended this book, Bayesian Data Analysis.

What degree should I get?

Q: Do I need a PhD to get into football analytics?

No, definitely not. Getting a PhD is not required and certainly not even expected for doing this kind of stuff, although there are certainly benefits to having one if you enjoy research (note: maximizing earnings is not one of those benefits). The big question is what you want to do. If it’s work for a team, you’d want to beef up your technical skills, perhaps through a Master’s program, and do stuff like compete in the Big Data Bowls, conduct research and get it out to the public, etc. If you want to go to grad school in econ, you’d probably want to do something like gaining research experience working under a professor and, if needed, taking the math/stats classes needed to be a good candidate for grad school. Ultimately this comes down to what you value so there’s no right answer imo, but earning a PhD is way, way overkill if what you want to do is work for a team. And finally, getting into sports (especially with a team) is hard so thinking about what you’d want to do if you don’t is also useful- i.e. ideally one would pick a field that is employable and inherently interesting to them.

Q: What field should I choose?

My background was in econ and that’s not the best preparation for getting into sports analytics (something like statistics or other fields with more exposure to data science / machine learning tools gives better training), with the caveat that I was in school a long time ago so maybe what is taught has chagned since then. With that said, here’s an example program- this is what Sean Clement did prior to getting hired by the Ravens (see in particular the Data Science track). Derrick Yam (Ravens) has a Master’s in biostatistics, Sarah Bailey (Rams) a Master’s in statistics, Sarah Mallepalle (Ravens) a B.S. in Statistics and Machine Learning, etc. These programs are a lot more technical than what you’d get in an MBA (which don’t make them better or worse, just more aligned with what the people getting these jobs are doing). Finally, I’ve heard good things about Coursera but haven’t personally used it.

I’m in college, what should I do?

There’s no one path, but some good answers to this when I posed this question on twitter:

Advice from people in the industry

See Namita Nandakumar’s excellent thread here. To highlight two tweets:

Matthew Barlowe:

Caio Brighenti:

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/mrcaseb/open-source-football, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Baldwin (2020, Aug. 24). Open Source Football: Getting into sports analytics. Retrieved from https://www.opensourcefootball.com/posts/2020-08-24-getting-into-sports-analytics/

BibTeX citation

@misc{baldwin2020getting,
  author = {Baldwin, Ben},
  title = {Open Source Football: Getting into sports analytics},
  url = {https://www.opensourcefootball.com/posts/2020-08-24-getting-into-sports-analytics/},
  year = {2020}
}