Date of Award

6-2025

Degree Name

MS in Statistics

Department/Program

Statistics

College

College of Science and Mathematics

Advisor

Kelly Bodwin

Advisor Department

Statistics

Advisor College

College of Science and Mathematics

Abstract

This thesis introduces a new implementation of the BIRCH clustering algorithm within the tidyclust framework in R. Traditional hierarchical clustering methods face scalability limitations with large datasets, due to their computational complexity. BIRCH offers a scalable alternative by summarizing data into microclusters using a CF-tree. This work shows the integration of the phases of the BIRCH algorithm into tidyclust, which enables a streamlined workflow for model specification, evaluation, and prediction. Through this implementation, scalable hierarchical clustering has been brought to R users within the tidyclust interface, enhacing the ability to better analyze large datasets.

Share

COinS