BS in Statistics
A statistical study was performed in order to explore the relationships of the offensive player statistics for every player in the 2008 Major League Baseball season. The purpose of the study was to explore various multivariate statistical methods within the data set.
The offensive variables in the study are: games, at-bats, runs, hits, singles, doubles, triples, homeruns, extra base hits, runs batted in, total bases, walks, strikeouts, stolen bases, times caught stealing, on-base percentage, slugging percentage, on-base plus slugging percentage, and batting average. All variables are season totals except for onbase percentage, slugging percentage, on-base plus slugging, and batting average. These variables are averages across the entire season. Analyses were done using regression with ‘team winning percentage’ as the response variable to see what effect the offensive variables had on team success. Discriminant analysis was used to investigate the differences, if any, between the American and National League with respect to the offensive variables. Using principal component analysis, the linear combinations that most explain the variability in the data were found to further see how the variables are related to each other. Lastly, using data for both the 2007 and 2008 seasons, regression models were investigated to predict future performance for a specific variable, in this case, runs batted in. These analyses failed to provide significant enough results to draw any useful conclusions.