Available at: https://digitalcommons.calpoly.edu/theses/2976
Date of Award
3-2025
Degree Name
MS in Electrical Engineering
Department/Program
Electrical Engineering
College
College of Engineering
Advisor
Jane Zhang
Advisor Department
Electrical Engineering
Advisor College
College of Engineering
Abstract
Though incredibly effective at detecting objects in isolated frames, modern object detection models are often not designed to take advantage of information present in previous frames of a video stream, despite that data being readily avail- able. To address this shortcoming, this paper proposes YOLOT, a modification of the widely used YOLOv8 object detection model, which seeks to utilize this temporal information with the addition of recurrent structures. In the design of YOLOT, a series of recurrent convolutional modules were inserted at backbone and neck outputs and the final and most effective design was found to be the insertion of a Convolutional Gated Recurrent Unit before the detect heads of the model. Training and evaluation of YOLOT are both performed on the challenging BDD100k MOTS Dataset, a benchmark for automotive object detection across seven classes. When evaluated, YOLOT outperforms the baseline architecture of YOLOv8 on the BDD100k validation dataset by 8.2 points in mAP50, while adding 10ms of inference time on an Nvidia RTX 3060m GPU. In addition to the development of YOLOT, this paper serves as the most comprehensive docu- ment outlining YOLOv8 and YOLOT’s architecture to date, including a detailed description of the baseline model and loss function, from basic concepts to high level design.