Degree Name

BS in Statistics


Statistics Department


Soma Roy


Missing data is something that we cannot prevent when data become missing while in the process of data collection. There are many reasons why data can be missing due to respondent refusing to answer a sensitive question or in fear of embarrassment. Researchers often assume their data are “missing completely at random” or “missing at random”. Unfortunately, we cannot test whether the mechanism condition is satisfied because missing values cannot be calculated. For my senior project, I will run simulation studies in SAS to observe the behaviors of missing data under different assumptions: missing completely at random, missing at random and ignorability. I will also compare the effects from imputation methods when a set of variables of interest are set to missing. The objective of this simulation study of imputation methods is to see how efficient substituted values in a dataset affect further studies. This will let readers decide which imputation method(s) would be best to approach a dataset when it comes to missing data.

senior (54 kB)
SAS Code

survey.csv (15 kB)
CSV file

MathAchieve.csv (244 kB)
CSV file