Getting Started with analyzing NFL play-by-play data using nflscrapR

Hello again!

We’re going to talk about analyzing NFL play by play data using the R package nflscrapR.

It was not long ago that NFL play-by-play data was hard to find, and if you did find it, it took a lot of clean up before you could even analyze it. Now, thanks to
Maksim Horowitz, Ron Yurko and Sam Ventura, analyzing NFL data is easier than ever.

Getting started

If you don’t have R, you need to first download and install it. I prefer the Microsoft Open version of R mainly because of the multi-threaded math libraries.

Once you download that you need an IDE (Integrated development environment). I prefer RStudio.

Once you have those downloaded and installed, open up RStudio.

To install nflscrapR, you need to install devtools before you are able to install the nflscrapR library.

install.packages("devtools")
devtools::install_github(repo = "maksimhorowitz/nflscrapR")

Once that’s installed, you can start pulling down some data. Let’s look at Super Bowl 53. The scrape_season_play_by_play will take a while. You might want to walk away for a while (or do something else) while it loads.

library(nflscrapR)

library(sqldf)

pbp_2018 <- scrape_season_play_by_play(2018, type = “post”)

sb_53 <- sqldf(“select * from pbp_2018 where game_id = ‘2019020300’”)

#Get rid of null or missing win probability rows

sb_53 <- sqldf(“select * from sb_53 where home_wp is not null and away_wp is not null”)

At this point, you can either visualize it in R using base graphics (ugh) or using ggplot2. But I’m going to export it and throw it into Tableau.

To export out the sb_53, you can either use the write.csv command, or since Tableau supports statistical file formats, we can use R’s RData format.

save(sb_53, file = “sb_53.RData”)

This will save it to your default directory (mine is Documents), but you can specify a directory by adding the path in front of your file name.

Visualizing in Tableau

Open up Tableau and click on Statistical file and choose your sb_53.RData file. Click on Sheet 1 to get started.

Let’s look at some win probability data. We need to clean it up a little.

First, make Game Seconds Remaining a Dimension.

1.) Drag Game Seconds Remaining to Columns and make it Continuous.

2.) Drag Home Wp to Rows. Right click -> Measure -> Maximum.

3.) If it’s not already a line under marks, make it a line.

4.) Right click on the Game Seconds Remaining axis and click Reversed.

You should see something like this:

nflscrapR_01

5.) Drag Away Wp to Rows. Right click -> Measure -> Maximum.

6.) Right click on Away Wp in your rows field, click Dual Axis. Then Right click on the Away Wp and synchronize the axes.

nflscrapR_02

The rest is just formatting.

nflscrapR_03

Here’s a link to the workbook on Tableau Public if you want to see how I formatted it.

This is just scratching the surface with what’s possible with nflscrapR, but this will get you started.

Reach out with any questions!

-Paul

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s