Date of Award


Degree Name

MS in Computer Science


Computer Science


College of Engineering


Paul Anderson

Advisor Department

Computer Science

Advisor College

College of Engineering


In the realm of biomedical technology, both accuracy and consistency are crucial to the development and deployment of these tools. While accuracy is easy to measure, consistency metrics are not so simple to measure, especially in the scope of biomedicine where prediction consistency can be difficult to achieve. Typically, biomedical datasets contain a significantly larger amount of features compared to the amount of samples, which goes against ordinary data mining practices. As a result, predictive models may fail to find valid pathways for prediction during training on such datasets. This concept is known as underspecification.

Underspecification has been more accepted as a concept in recent years, with a handful of recent works exploring underspecification in different applications and a handful of past works experiencing underspecification prior to its declaration. However, underspecification is still under-addressed, to the point where some academics might even claim that it is not a significant problem.

With this in mind, this thesis aims to identify and minimize underspecification of deep learning cancer subtype predictors. To address these goals, this work details the development of Predicting Underspecification Monitoring Pipeline (PUMP), a software tool to provide methodology for data analysis, stress testing, and model evaluation. In this context, the hope is that PUMP can be applied to deep learning training such that any user can ensure that their models are able to generalize to new data as best as possible.