DOI: https://doi.org/10.15368/theses.2020.6
Available at: https://digitalcommons.calpoly.edu/theses/2119
Date of Award
1-2020
Degree Name
MS in Computer Science
Department/Program
Computer Science
College
College of Engineering
Advisor
John Seng
Advisor Department
Computer Science
Advisor College
College of Engineering
Abstract
Due to automation, the world is changing at a rapid pace. Autonomous agents have become more common over the last several years and, as a result, have created a need for improved software to back them up. The most important aspect of this greater software is path prediction, as robots need to be able to decide where to move in the future. In order to accomplish this, a robot must know how to avoid humans, putting frame prediction at the core of many modern day solutions. A popular way to solve this complex problem of frame prediction is Auto Encoder LSTMs. Though there are many implementations of this, at its core, it is a neural network comprised of a series of time sensitive processing blocks that shrink and then grow the data’s dimensions to make a prediction. The idea of using Auto Encoder styled networks to do frame prediction has also been adapted by others to make Temporal Encoders. These neural networks work much like traditional Auto Encoders, in which the data is reduced then expanded back up. These networks attempt to tease out a series of frames, including a predictive frame of the future. The problem with many of these networks is that they take an immense amount of computation power, and time to get them performing at an acceptable level. This thesis presents possible ways of pre-processing input frames to these networks in order to gain performance, in the best case seeing a 360x improvement in accuracy compared to the original models. This thesis also extends the work done with Temporal Encoders to create more precise prediction models, which showed consistent improvements of at least 50% for some metrics. All of the generated models were compared using a simulated data set collected from recordings of ground level viewpoints from Cities: Skylines. These predicted frames were then analyzed using a common perceptual distance metric, that is, Minkowski distance, as well as a custom metric that tracked distinct areas in frames. All of the following was run on a constrained system in order to see the effects of the changes as they pertain to systems with limited hardware access.