DOI: https://doi.org/10.15368/theses.2020.175
Available at: https://digitalcommons.calpoly.edu/theses/2343

Date of Award

12-2020

Degree Name

MS in Electrical Engineering

Department/Program

Electrical Engineering

College

College of Engineering

Advisor

Jane Zhang

Advisor Department

Electrical Engineering

Advisor College

College of Engineering

Abstract

Neural network approaches have become popular in the field of automatic speech recognition (ASR). Most ASR methods use audio data to classify words. Lip reading ASR techniques utilize only video data, which compensates for noisy environments where audio may be compromised. A comprehensive approach, including the vetting of datasets and development of a preprocessing chain, to video-based ASR is developed. This approach will be based on neural networks, namely 3D convolutional neural networks (3D-CNN) and Long short-term memory (LSTM). These types of neural networks are designed to take in temporal data such as videos. Various combinations of different neural network architecture and preprocessing techniques are explored. The best performing neural network architecture, a CNN with bidirectional LSTM, compares favorably against recent works on video-based ASR.

Download

Included in

Electrical and Computer Engineering Commons

COinS

Master's Theses

Video Based Automatic Speech Recognition Using Neural Networks

Date of Award

Degree Name

Department/Program

College

Advisor

Advisor Department

Advisor College

Abstract

Included in

Search

Browse

Author Corner

LINKS

Master's Theses

Video Based Automatic Speech Recognition Using Neural Networks

Author

Date of Award

Degree Name

Department/Program

College

Advisor

Advisor Department

Advisor College

Abstract

Included in

Share

Search

Browse

Author Corner

LINKS