Date of Award

7-2025

Degree Name

MS in Electrical Engineering

Department/Program

Electrical Engineering

College

College of Engineering

Advisor

Jane Zhang

Advisor Department

Electrical Engineering

Advisor College

College of Engineering

Abstract

Vehicular collisions represent a significant public health concern, necessitating re search into advanced emergency notification systems. While deep learning has shown promise in accident detection, a research gap persists in applying state-of-the-art transformer architectures to the task of anticipatory, real-time crash prediction from video. This thesis addresses this gap by developing and evaluating a Video Vision Transformer (ViViT) for the binary classification of imminent vehicular collisions. Utilizing a curated dataset of 1,493 unique collision sequences, this study systemati cally investigates the impact of temporal context by comparing the ViViT against a single-frame Vision Transformer (ViT) baseline and conducting comprehensive exper iments on temporal hyperparameters like tubelet depth and frame stride. The results compellingly demonstrate that leveraging temporal context yields substantial per formance improvements, with the optimal ViViT model achieving 98.72% accuracy and, most critically, a recall of 98.75%—a 7.63 percentage point improvement over the baseline. The findings validate the efficacy of pure attention-based models for this safety-critical application and establish a strong methodological foundation for developing intelligent transportation systems capable of reducing emergency response times and saving lives.

Share

COinS