Published in 21st IS&T/SPIE Electronic Imaging Proceedings: San Jose, CA, Volume 7257, January 17, 2009.
The definitive version is available at https://doi.org/10.1117/12.817092.
When combined with acoustical speech information, visual speech information (lip movement) significantly improves Automatic Speech Recognition (ASR) in acoustically noisy environments. Previous research has demonstrated that visual modality is a viable tool for identifying speech. However, the visual information has yet to become utilized in mainstream ASR systems due to the difficulty in accurately tracking lips in real-world conditions. This paper presents our current progress in tracking face and lips in visually challenging environments. Findings suggest the mean shift algorithm performs poorly for small regions, in this case the lips, but it achieves near 80% accuracy for facial tracking.
Electrical and Computer Engineering
One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.