Electrical Engineering

Automatic Speechreading with Applications to Human-Computer Interfaces

Xiaozheng Zhang, Georgia Institute of Technology - Main CampusFollow
Charles C. Broun, Motorola Human Interface Lab
Russell M. Mersereau, Georgia Institute of Technology - Main Campus
Mark A. Clements, Georgia Institute of Technology - Main Campus

Recommended Citation

Published in EURASIP Journal on Applied Signal Processing, Volume 2002, Issue 11, January 1, 2002, pages 1228-1247.

NOTE: At the time of publication, the author Xiaozheng Zhang was not yet affiliated with Cal Poly.

The definitive version is available at https://doi.org/10.1155/S1110865702206137.

Abstract

There has been growing interest in introducing speech as a new modality into the human-computer interface (HCI). Motivated by the multimodal nature of speech, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved system performance over acoustic-only methods, especially in noisy environments. In this paper, we investigate the usefulness of visual speech information in HCI related applications. We first introduce a new algorithm for automatically locating the mouth region by using color and motion information and segmenting the lip region by making use of both color and edge information based on Markov random fields. We then derive a relevant set of visual speech parameters and incorporate them into a recognition engine. We present various visual feature performance comparisons to explore their impact on the recognition accuracy, including the lip inner contour and the visibility of the tongue and teeth. By using a common visual feature set, we demonstrate two applications that exploit speechreading in a joint audio-visual speech signal processing task: speech recognition and speaker verification. The experimental results based on two databases demonstrate that the visual information is highly effective for improving recognition performance over a variety of acoustic noise levels.

Disciplines

Electrical and Computer Engineering

Copyright

2002 Hindawi Publishing.

Publisher statement

This work is licensed under a Creative Commons Attribution 2.0 Generic License.

Download

Included in

Electrical and Computer Engineering Commons

COinS

URL: https://digitalcommons.calpoly.edu/eeng_fac/263

Electrical Engineering

Automatic Speechreading with Applications to Human-Computer Interfaces

Recommended Citation

Abstract

Disciplines

Copyright

Publisher statement

Included in

Search

Browse

Author Corner

LINKS

Electrical Engineering

Automatic Speechreading with Applications to Human-Computer Interfaces

Author Info

Recommended Citation

Abstract

Disciplines

Copyright

Publisher statement

Included in

Share

Search

Browse

Author Corner

LINKS