"YOLOT: A Recurrent YOLO Model for Robust Video-Based Automotive Objec" by Dylan Jay Baxter

Date of Award

3-2025

Degree Name

MS in Electrical Engineering

Department/Program

Electrical Engineering

College

College of Engineering

Advisor

Jane Zhang

Advisor Department

Electrical Engineering

Advisor College

College of Engineering

Abstract

Though incredibly effective at detecting objects in isolated frames, modern object detection models are often not designed to take advantage of information present in previous frames of a video stream, despite that data being readily avail- able. To address this shortcoming, this paper proposes YOLOT, a modification of the widely used YOLOv8 object detection model, which seeks to utilize this temporal information with the addition of recurrent structures. In the design of YOLOT, a series of recurrent convolutional modules were inserted at backbone and neck outputs and the final and most effective design was found to be the insertion of a Convolutional Gated Recurrent Unit before the detect heads of the model. Training and evaluation of YOLOT are both performed on the challenging BDD100k MOTS Dataset, a benchmark for automotive object detection across seven classes. When evaluated, YOLOT outperforms the baseline architecture of YOLOv8 on the BDD100k validation dataset by 8.2 points in mAP50, while adding 10ms of inference time on an Nvidia RTX 3060m GPU. In addition to the development of YOLOT, this paper serves as the most comprehensive docu- ment outlining YOLOv8 and YOLOT’s architecture to date, including a detailed description of the baseline model and loss function, from basic concepts to high level design.

Share

COinS