Postprint version. Published in 9th International Conference on Visual Information and Information Systems (VISUAL) Proceedings: Shanghai, China, June 28, 2007, pages 185-192.
Lip movement of a speaker conveys important visual speech information and can be exploited for Automatic Speech Recognition. While previous research demonstrated that visual modality is a viable tool for identifying speech, the visual information has yet to become utilized in mainstream ASR systems. One obstacle is the difficulty in building a robust visual front end that tracks lips accurately in a real-world condition. In this paper we present our current progress in addressing the issue. We examine the use of color information in detecting the lip region and report our results on the statistical analysis and modeling of lip hue images by examining hundreds of manually extracted lip images obtained from several databases. In addition to hue color, we also explore spatial and edge information derived from intensity and saturation images to improve the robustness of the lip detection. Successful application of this algorithm is demonstrated over imagery collected in visually challenging environments.
Electrical and Computer Engineering