DOI: https://doi.org/10.15368/theses.2020.7
Available at: https://digitalcommons.calpoly.edu/theses/2109

Date of Award

12-2019

Degree Name

MS in Electrical Engineering

Department/Program

Electrical Engineering

College

College of Engineering

Advisor

Jane Zhang

Advisor Department

Electrical Engineering

Advisor College

College of Engineering

Abstract

Main stream automatic speech recognition (ASR) makes use of audio data to identify spoken words, however visual speech recognition (VSR) has recently been of increased interest to researchers. VSR is used when audio data is corrupted or missing entirely and also to further enhance the accuracy of audio-based ASR systems. In this research, we present both a framework for building 3D feature cubes of lip data from videos and a 3D convolutional neural network (CNN) architecture for performing classification on a dataset of 100 spoken words, recorded in an uncontrolled envi- ronment. Our 3D-CNN architecture achieves a testing accuracy of 64%, comparable with recent works, but using an input data size that is up to 75% smaller. Overall, our research shows that 3D-CNNs can be successful in finding spatial-temporal features using unsupervised feature extraction and are a suitable choice for VSR-based systems.

Download

Included in

Signal Processing Commons

COinS

Master's Theses

Visual Speech Recognition Using a 3D Convolutional Neural Network

Date of Award

Degree Name

Department/Program

College

Advisor

Advisor Department

Advisor College

Abstract

Included in

Search

Browse

Author Corner

LINKS

Master's Theses

Visual Speech Recognition Using a 3D Convolutional Neural Network

Author

Date of Award

Degree Name

Department/Program

College

Advisor

Advisor Department

Advisor College

Abstract

Included in

Share

Search

Browse

Author Corner

LINKS