Master's Theses

A Case Study of Leveraging Auxiliary Data to Improve Treatment Effect Estimation Precision in an HIV Clinical Trial

Lana Mai Huynh, California Polytechnic State University, San Luis ObispoFollow

Available at: https://digitalcommons.calpoly.edu/theses/3014

Date of Award

6-2025

Degree Name

MS in Statistics

Department/Program

Statistics

College

College of Science and Mathematics

Advisor

Charlotte Mann

Advisor Department

Statistics

Advisor College

College of Science and Mathematics

Abstract

Randomized controlled trials (RCTs) are typically the gold standard for evaluating treatment efficacy in the medical field, yet they can have small sample sizes due to logistical, ethical, and financial constraints. This limitation can result in imprecise treatment effect estimates. Recent methods have sought to enhance the precision of RCT estimates by incorporating information from large, observational, “auxiliary” datasets. An auxiliary dataset includes units that were not randomized in the trial itself, but may be similar to the RCT sample. By leveraging predictive models trained on these auxiliary data, researchers can adjust for potentially more powerful covariates, thereby reducing variance in treatment effect estimation without compromising the integrity of the randomization. Previous applications of this approach have shown its efficacy using educational experiments, where the auxiliary data originate from the same data source as the RCT. To extend and validate the robustness of this approach across domains, this thesis applies the estimation approach to a medical RCT, using publicly available data from an entirely different source.

We analyzed the CHOICES (CTN-0055) RCT, which investigated the feasibility and acceptability of extended-release naltrexone (XR-NTX) as treatment for HIV-infected individuals with opioid or alcohol use disorders for 51 participants. We supplement this analysis with data from NHANES (National Health and Nutrition Examination Survey), a nationally representative survey that collects extensive health and nutrition data from thousands of adults and children across the United States, making it an ideal large-scale auxiliary dataset for our analysis. We leveraged the auxiliary NHANES dataset to develop an auxiliary model that predicts recent alcohol use. We compared methods that integrate experimental and auxiliary data using these model predictions to more standard estimators of the effect of XR-NTX on alcohol use. Our findings did not demonstrate improved precision from incorporating auxiliary model predictions, highlighting potential challenges when applying auxiliary data using an external data source. This case study provides insights into the practical limitations and considerations of using auxiliary data for precision enhancement in small-sample medical RCTs.

Download

COinS

Master's Theses

A Case Study of Leveraging Auxiliary Data to Improve Treatment Effect Estimation Precision in an HIV Clinical Trial

Date of Award

Degree Name

Department/Program

College

Advisor

Advisor Department

Advisor College

Abstract

Search

Browse

Author Corner

LINKS

Master's Theses

A Case Study of Leveraging Auxiliary Data to Improve Treatment Effect Estimation Precision in an HIV Clinical Trial

Author

Date of Award

Degree Name

Department/Program

College

Advisor

Advisor Department

Advisor College

Abstract

Share

Search

Browse

Author Corner

LINKS