Available at: https://digitalcommons.calpoly.edu/theses/2906
Date of Award
3-2023
Degree Name
MS in Computer Science
Department/Program
Computer Science
College
College of Engineering
Advisor
John Clements
Advisor Department
Computer Science
Advisor College
College of Engineering
Abstract
The DataFrame is a powerful table-like data structure used frequently in Data Science, the in-demand and innovative field focused on the extraction of valuable insights from data. Typically, datasets are not perfect upon collection and need to be prepared so that the resulting dataset is useful for statistical analysis. A DataFrame API supports optimized methods such as, selecting, aggregating and filtering rows, columns, and cells as well as renaming row and column labels. It also supports methods for normalizing data, merging data, adding new columns and labelling missing data among numerous other features. An API to work with tabular data would be useful in any general purpose language, so DataFrames have been incorporated into libraries like Pandas for Python and provided as native libraries in the languages R and Scala. Due to their wide-ranging use it is not uncommon to find implementations in many other languages like Java and Julia \cite{BigData}.
In this work, we introduce RacketFrames, a Racket V8.0+ DataFrame implementation. We show the benefits an implementation can have on existing and future Racket projects. To quantify the performance of major DataFrame operations, we measure speed against Python Pandas and compare functional and object oriented paradigms. We hope to continue the trend for further Data Science tool development for Racket and other programming languages.