Date of Award

6-2024

Degree Name

MS in Electrical Engineering

Department/Program

Electrical Engineering

College

College of Engineering

Advisor

Andrew Danowitz

Advisor Department

Electrical Engineering

Advisor College

College of Engineering

Abstract

With machine learning workloads currently at very large scales, models are distributed across large compute systems. On distributed systems, the performance of these models are limited by the bandwidth limitations of chip-to-chip communication. To relieve this bottleneck, spiking neural networks (SNNs) can be utilized to reduce inter-chip communication traffic utilizing inherit network sparsity. However, in comparison to traditional artificial neural networks (ANNs), SNNs can have significant degradation in performance with increased network scale and complexity.

This research proposes a hybrid neural network accelerator that uses the best of both spiking and non-spiking layers by allocating a majority of resources to nonspiking layers on the interior of the chip while bandwidth-limited areas (e.g., I/O pads, or chip separation boundaries) employ spike-based data traffic. By limiting the overall use of spiking layers within the network, we realize the energy savings of SNNs without the a degradation in accuracy which comes with large spike-based models.

We present a scalable chiplet architecture and show how hybrid data is managed with both spike and non-spiking data communication. We also demonstrate how the asynchronous spike-based model is integrated efficiently with the synchronous artificial-based deep learning workloads. We demonstrate that our hybrid architecture offers significant improvements in performance, accuracy, and energy consumption in comparison to SNNs and ANNs. With up to a 1.34× increase in energy efficiency and 1.56× decrease in single inference latency, the versatility of the architecture is demonstrated by its validation across multiple datasets, encompassing both language processing and computer vision tasks.

Share

COinS