Date of Award

3-2023

Degree Name

MS in Computer Science

Department/Program

Computer Science

College

College of Engineering

Advisor

John Clements

Advisor Department

Computer Science

Advisor College

College of Engineering

Abstract

The DataFrame is a powerful table-like data structure used frequently in Data Science, the in-demand and innovative field focused on the extraction of valuable insights from data. Typically, datasets are not perfect upon collection and need to be prepared so that the resulting dataset is useful for statistical analysis. A DataFrame API supports optimized methods such as, selecting, aggregating and filtering rows, columns, and cells as well as renaming row and column labels. It also supports methods for normalizing data, merging data, adding new columns and labelling missing data among numerous other features. An API to work with tabular data would be useful in any general purpose language, so DataFrames have been incorporated into libraries like Pandas for Python and provided as native libraries in the languages R and Scala. Due to their wide-ranging use it is not uncommon to find implementations in many other languages like Java and Julia \cite{BigData}.

In this work, we introduce RacketFrames, a Racket V8.0+ DataFrame implementation. We show the benefits an implementation can have on existing and future Racket projects. To quantify the performance of major DataFrame operations, we measure speed against Python Pandas and compare functional and object oriented paradigms. We hope to continue the trend for further Data Science tool development for Racket and other programming languages.

Share

COinS