Degree Name

BS in Statistics


Statistics Department


Soma Roy, Karen McGaughey


Cal Poly currently has one of the largest ongoing university health studies in the United States. Launched in Fall 2009, the Cal Poly FLASH study, led by the Kinesiology department and STRIDE, is a longitudinal study that tracks the classes of 2013 and 2014 through online surveys and physical assessments. The data collected covers various areas such as perceived health, lifestyle choices, and actual physical health.

My project analyzed the FLASH data to investigate the relationship between various perceived variables and actual health measures for Cal Poly freshmen. The motivation for this analysis was an interest in both diet and exercise and its impact on an individual’s overall health. In particular, my interest lies in what a person perceives as their diet and exercise regimen and how that relates to overall health. To assess overall health, I examined both the Body Mass Index (BMI) and blood pressure of students. BMI was computed using the standard formula involving height and weight. Blood pressure was classified by using both systolic and diastolic blood pressure.

Conventional wisdom states that proper diet and exercise leads to better overall health. I was interested in the following research question: “Can we simultaneously model college students’ BMI and blood pressure using various lifestyle variables?” The response variables chosen were BMI and blood pressure and the explanatory variables examined consisted of various lifestyle variables such as diet preference, activity level, marijuana use, cigarette use, and alcohol use. Before I simultaneously modeled BMI and blood pressure, I created several models that had univariate responses.

My goal was to simultaneously model college students’ BMI and blood pressure using different lifestyle variables. Using the FLASH data containing the first time physical assessment with its survey from that corresponding quarter, I was able to investigate this question.

In my investigation, I used Discrete Multivariate Analysis to compute two separate generalized logit functions for each response, BMI and blood pressure, and Cluster Analysis to group lifestyle variables by their similarities to each other. By using Discrete Multivariate Analysis, I was able to take into account the relationship that existed between BMI and blood pressure. In each model, sex was a significant explanatory variable. Cluster Analysis illustrated that while certain variables can be grouped together, many lifestyle variables are different from each other.

While an incredibly useful method, the sample size limitations that exist make it difficult to create models with multiple explanatory variables. For future analysis, it would be interesting to see the association of the overall health measures, BMI and blood pressure, with other lifestyle variables. Additionally, with further data cleaning, it might be interesting to add more lifestyle variables into the cluster analysis to see if more clusters form.