BS in Statistics
This is an analysis on National Football League (NFL) data for the 2013-2014 regular season. The main goal is to find hidden trends in game data that can ultimately determine which factors are statistically significant to award a team with their ultimate objective, a win.
The main response variable to be examined is total wins throughout the regular season, and an alternative dependent variable is spread; the difference between a team’s points scored, and points against. Spread is analyzed to provide a different quantitative response variable that can be both positive and negative.
Game data was gathered from ESPN.com box scores via a user-defined SAS 9.3® program that involved manual data entry. This program required the user to enter minimal statistics from game box scores, and the program calculated several different percentages, averages, grouped statistics, etc. for a total of 46,592 individual statistics for the whole season.
These data are read into R® x64 2.14.1 for linear regression analysis. All game data is combined for a season-wide data set of 5,824 statistics for all 32 teams. 1,220 linear regression models of all types from one predictor, two way, and three explanatory variable models are created and a p-value, 0.05 significance test, and adjusted R2 statistics are extracted from each model. Then all models are sorted by their adjusted R2 to create a table of sorted variables that explain the most variability in team's total wins across the season. These highest predictive variables/statistics are exactly what NFL teams should focus on to increase their probability of winning any game in the regular season.