Available at: https://digitalcommons.calpoly.edu/theses/2965
Date of Award
6-2023
Degree Name
MS in Electrical Engineering
Department/Program
Electrical Engineering
College
College of Engineering
Advisor
Joseph Callenes-Sloan
Advisor Department
Electrical Engineering
Advisor College
College of Engineering
Abstract
Recent developments in machine learning and artificial intelligence have sparked an influx of workloads that require specialized computer hardware for cloud services. The hardware running machine learning models predominantly consists of graphics processing units (GPUs) and tensor processing units (TPUs). However, these com- ponents are expensive for cloud services to purchase, costly for customers to rent, prone to price spikes, and energy-intensive. In this research we show that both cloud services and customers would benefit from utilizing field programmable gate arrays (FPGAs) to alleviate the aforementioned challenges. An FPGA can be configured as a machine learning accelerator, operating similarly to a GPU or TPU. We propose an FPGA-based architecture that utilizes compressed neural networks. This provides an alternative hardware option for those running machine learning inference, resulting in savings in server construction costs, power consumption, workload distribution, and providing a more affordable option for customers. We evaluate the system using architectural simulations and cloud-based deployment and demonstrate 76.2% cost and 26.5% energy savings on average from using the Neural Compression Inference Accelerator (NCIA).