Available at: https://digitalcommons.calpoly.edu/theses/3023
Date of Award
6-2025
Degree Name
MS in Statistics
Department/Program
Statistics
College
College of Science and Mathematics
Advisor
Kelly Bodwin
Advisor Department
Statistics
Advisor College
College of Science and Mathematics
Abstract
Clustering is a fundamental technique in unsupervised learning that can be used to find hidden patterns and structures within unlabeled data. The tidyclust package in R provides a unified interface for applying various clustering techniques to data. This paper outlines the addition of density-based clustering with DBSCAN, and model-based clustering using Gaussian mixture models (GMMs) to the tidyclust package. DBSCAN can be performed using the db_clust() function and makes use of the dbscan package implementation as its engine. GMMs can be fit using the gm_clust() function which makes use of the mclust package implementation. This paper highlights the changes made to these underlying implementations in the process of bringing these methods into tidyclust. This includes changes to the model argument names, how the model is fit on data, and how the model is used to predict on future data.