Date of Award

6-2023

Degree Name

MS in Electrical Engineering

Department/Program

Electrical Engineering

College

College of Engineering

Advisor

Joseph Callenes-Sloan

Advisor Department

Electrical Engineering

Advisor College

College of Engineering

Abstract

Recent developments in machine learning and artificial intelligence have sparked an influx of workloads that require specialized computer hardware for cloud services. The hardware running machine learning models predominantly consists of graphics processing units (GPUs) and tensor processing units (TPUs). However, these com- ponents are expensive for cloud services to purchase, costly for customers to rent, prone to price spikes, and energy-intensive. In this research we show that both cloud services and customers would benefit from utilizing field programmable gate arrays (FPGAs) to alleviate the aforementioned challenges. An FPGA can be configured as a machine learning accelerator, operating similarly to a GPU or TPU. We propose an FPGA-based architecture that utilizes compressed neural networks. This provides an alternative hardware option for those running machine learning inference, resulting in savings in server construction costs, power consumption, workload distribution, and providing a more affordable option for customers. We evaluate the system using architectural simulations and cloud-based deployment and demonstrate 76.2% cost and 26.5% energy savings on average from using the Neural Compression Inference Accelerator (NCIA).

Share

COinS