Postprint version. Published in Proceedings from the International Conference on Acoustics, Speech, and Signal Processing: Orlando, FL, May 13, 2002.
NOTE: At the time of publication, the author Xiaozheng Zhang was not yet affiliated with Cal Poly.
The definitive version is available at https://doi.org/10.1109/ICASSP.2002.5743810.
Speech not only conveys the linguistic information, but also characterizes the talker's identify and therefore can be used in personal authentication. While most of the speech information is contained in the acoustic channel, the lip movement during speech production also provides useful information. In this paper we investigate the effectiveness of visual speech features in a speaker veri pound sterling cation task. We pound sterling rst present the visual front-end of the automatic speechreading system. We then develop a recognition engine to train and recognize sequences of visual parameters. The experimental results based on the XM2VTS database  demonstrate that visual information is highly effective in reducing both false acceptance and false rejection rates in speaker verification tasks.
Electrical and Computer Engineering
Number of Pages
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.