DOI: https://doi.org/10.15368/theses.2018.145
Available at: https://digitalcommons.calpoly.edu/theses/1993
Date of Award
12-2018
Degree Name
MS in Industrial Engineering
Department/Program
Industrial and Manufacturing Engineering
Advisor
Reza Pouraghabagher
Abstract
The increasing suicide rate in the United States has amplified the need to assure that regions with high suicide risk receive adequate funding for programs and related resources for prevention methods. The way in which organizations dedicated to preventing suicides distribute funding could be improved with the development of predictive models for suicide rates. In this study, a multiple linear regression model at a national level was developed to identify relevant factors associated with suicide. The national level model was developed in two phases; the first using response variable data and explanatory variable data from the same time period, and the second with the response variable data shifted one time period to create a more accurate model for prediction. The models had k-fold R-squared values of 0.676 and 0.675. The national model identified four variables to include in a predictive state level model: Foreclosure Rates, Violent Crime Rates, Gini ratio, and Consumption Volume. In the second part of this study, the use of Twitter data in a state level model was evaluated. Tweets terms relating to suicide were identified in fifteen states over a thirty-one-day period and used to calculate three variables: Tweet rate, Favorite rate, and Retweet rate. Each of these three variables for the terms “suicide” and “suicidal” underwent an Analysis of Variance test (ANOVA) to check for differences between states. Each ANOVA test resulted in a p-value less than 0.0001 providing strong evidence that there was a difference in Tweet rate, Favorite rate, and Retweet rate for the two search phrases analyzed among the states. Next, a Pearson Product-Moment correlation coefficient and Pearson Rho correlation coefficient were evaluated for each Twitter variable and the states’ historical suicide rates. All computed correlation coefficients were between -0.15 and 0.3 suggesting that there is, at best, a weak correlation between the Twitter variables and a state’s historical suicide rate. The results from the Twitter data analysis suggest that it is too early to accurately incorporate such data into a state level multiple linear regression model. The results of this study would help in further development of a state level model that allows organizations, dedicated to reducing suicides, to allocate related resources more efficiently.