Electrical Engineering

Visual Speech Feature Extraction for Improved Speech Recognition

Xiaozheng Zhang, Georgia Institute of Technology - Main CampusFollow
Russell M. Mersereau, Georgia Institute of Technology - Main Campus
Mark A. Clements, Georgia Institute of Technology - Main Campus
Charles C. Broun, Motorola Human Interface Labs

Recommended Citation

Postprint version. Published in Proceedings from the International Conference on Acoustics, Speech, and Signal Processing: Orlando, FL, May 13, 2002.

NOTE: At the time of publication, the author Xiaozheng Zhang was not yet affiliated with Cal Poly.

The definitive version is available at https://doi.org/10.1109/ICASSP.2002.5745022.

Abstract

Mainstream automatic speech recognition has focused almost exclusively on the acoustic signal. The performance of these systems degrades considerably in the real world in the presence of noise. On the other hand, most human listeners, both hearing-impaired and normal hearing, make use of visual information to improve speech perception in acoustically hostile environments. Motivated by humans' ability to lipread, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved accuracy over totally acoustic systems, especially in noisy environments. In this paper, we investigate the usefulness of visual information in speech recognition. We first present a method for automatically locating and extracting visual speech features from a talking person in color video sequences. We then develop a recognition engine to train and recognize sequences of visual parameters for the purpose of speech recognition. We particularly explore the impact of various combinations of visual features on the recognition accuracy. We conclude that the inner lip contour features together with the information about the visibility of the tongue and teeth significantly improve the performance over using outer contour only features in both speaker dependent and speaker independent recognition tasks.

Disciplines

Electrical and Computer Engineering

Copyright

2002 IEEE.

Number of Pages

Publisher statement

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Download

Included in

Electrical and Computer Engineering Commons

COinS

URL: https://digitalcommons.calpoly.edu/eeng_fac/264

Electrical Engineering

Visual Speech Feature Extraction for Improved Speech Recognition

Recommended Citation

Abstract

Disciplines

Copyright

Number of Pages

Publisher statement

Included in

Search

Browse

Author Corner

LINKS

Electrical Engineering

Visual Speech Feature Extraction for Improved Speech Recognition

Author Info

Recommended Citation

Abstract

Disciplines

Copyright

Number of Pages

Publisher statement

Included in

Share

Search

Browse

Author Corner

LINKS