Exploratory Data Analysis Using Tidyverse

This post assumes beginner knowledge of R.

Welcome to the second article in our series on basic data cleaning and data manipulation! In this article, we’re going to use play-by-play data from two NHL games and answer two questions:

  • which power play unit generated the best shot rate in each game?
  • which defenseman played the most 5v5 minutes in each game?

In the process of doing so, we’ll cover several topics of basic data manipulation in the tidyverse, including using functions, creating joins, grouping and summarizing data, and working with string data.

Continue reading

Combining Manually-Tracked Data with Play-by-Play Data

This post assumes beginner knowledge of R.

If you’ve ever analyzed hockey data, then you’re probably familiar with the NHL’s Real Time Scoring System, which produces what’s more commonly known as play-by-play data. These data are publicly available and allow us to see every event recorded by the NHL in a given game. Shown below are selected details about the first 10 events from two games on February 18, 2019: Tampa Bay at Columbus and Vegas at Colorado.

Continue reading