Date of Award

6-2025

Degree Name

MS in Statistics

Department/Program

Statistics

College

College of Science and Mathematics

Advisor

Kelly Bodwin

Advisor Department

Statistics

Advisor College

College of Science and Mathematics

Abstract

Unsupervised learning is closely associated with clustering, however other methods fall under this umbrella such as data mining. In R, the tidyclust package provides a unified interface for clustering models, yet lacks support for data mining. This thesis addresses this gap by introducing the Apriori and ECLAT algorithms into tidyclust, with a focus on frequent itemset mining. Unlike traditional clustering models, frequent itemsets produce groupings of column variables, rather than cluster labels or partitions of observations. To address this, a novel clustering approach is proposed: items (columns) are grouped based on their ”dominant” frequent itemset. A key contribution is a new prediction method, modeled as a recommender system, to predict missing items. This implementation extends tidyclust to support column-based clustering, with applications in market basket analysis and recommender systems.

Included in

Data Science Commons

Share

COinS