Electrical Engineering

Audio-Visual Speech Recognition by Speechreading

Xiaozheng Zhang, Georgia Institute of Technology - Main CampusFollow
Russell M. Mersereau, Georgia Institute of Technology - Main Campus
Mark A. Clements, Georgia Institute of Technology - Main Campus

Recommended Citation

Postprint version. Published in Proceedings from the 14th International Conference on Digital Signal Processing, January 1, 2002.

NOTE: At the time of publication, the author Xiaozheng Zhang was not yet affiliated with Cal Poly.

The definitive version is available at https://doi.org/10.1109/ICDSP.2002.1028275.

Abstract

Speechreading increases intelligibility in human speech perception. This suggests that conventional acoustic-based speech processing can benefit from the addition of visual information. This paper exploits speechreading for joint audio-visual speech recognition. We first present a color-based feature extraction algorithm that is able to extract salient visual speech features reliably from a frontal view of the talker in a video sequence. Then, a new fusion strategy using a coupled hidden Markov model (CHMM) is proposed to incorporate visual modality into the acoustic subsystem. By maintaining temporal coupling across the two modalities at the feature level and allowing asynchrony in the state at the same time, a CHMM provides a better model for capturing temporal correlations between the two streams of information. The experimental results demonstrate that the combined audio-visual system outperforms the acoustic-only recognizer over a wide range of noise levels.

Disciplines

Electrical and Computer Engineering

Copyright

2002 IEEE.

Number of Pages

Publisher statement

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Download

Included in

Electrical and Computer Engineering Commons

COinS

URL: https://digitalcommons.calpoly.edu/eeng_fac/261

Electrical Engineering

Audio-Visual Speech Recognition by Speechreading

Recommended Citation

Abstract

Disciplines

Copyright

Number of Pages

Publisher statement

Included in

Search

Browse

Author Corner

LINKS

Electrical Engineering

Audio-Visual Speech Recognition by Speechreading

Author Info

Recommended Citation

Abstract

Disciplines

Copyright

Number of Pages

Publisher statement

Included in

Share

Search

Browse

Author Corner

LINKS