Published in 10th IASTED International Conference on Signal and Image Processing Proceedings: Kailua-Kona, HI, August 8, 2008, pages 470-475.
When combined with acoustical speech information, visual speech information (lip movement) significantly improves Automatic Speech Recognition (ASR) in acoustically noisy environments. Previous research has demonstrated that visual modality is a viable tool for identifying speech. However, the visual information has yet to become utilized in mainstream ASR systems due to the difficulty in accurately tracking lips in real-world conditions. This paper presents our current progress in addressing this issue. We derive several algorithms based on a modified HSI color space to successfully locate the face, eyes, and lips. These algorithms are then tested over imagery collected in visually challenging environments.
Electrical and Computer Engineering
2008 ACTA Press.