Data Analysis of USAToday’s NFL Arrest database: 15 Surprising Insights

Wrap up

This data set touches a lot of aspects in the US beyond sports: education, location, our legal system and much more. Had I had more time I would have loved to explore some other dimensions of analysis (in the following order):

  • Analysis of player salary against Team Response to arrest (or what was it about those 45 players that made them get released faster than others)
  • Analysis or arrest frequency and criminal charge by college (and college location)
  • Analysis of the same by player position (scaled for against the number of players in each position

Methodology

I used the following languages and tools in my analysis

  • Python
  • R Studio
  • Microsoft Excel
  • Tableau Public
  • Google Fusion Tables
  • Google Charts

I did not embed the Tableau Public or Google Chart images due to conflicts with WordPress style sheets.

I applied these against the following data sources:

  • Arrests: USA Today NFL Arrest Databaseand HuffingtonPost.com
  • Age Size and Weight: NFL.com (player attributes). In some cases when data was not available (e.g., players dismissed at start of first year of career I found player attributes at ESPN.com and CBSSports.com). Comparable height and weight for US children and adults from the CDC
  • Speed and Strength: NFL Combine Results (apologies for potentially crashing your site with some data crawling)
  • Income: NFL Salary info at Business Insider and Chron.com. US household income at Census.gov
  • Education: US data on education at The Atlantic, NFL data from NFL.com and Wikipedia.com

In all cases, I let the data speak for itself, first exploring it along particular dimensions to see potential patterns, then testing those patterns for significance.