Definition of Visual Speech Element and Research on a Method of Extracting Feature Vector for Korean Lip-Reading
This work addresses lip-reading for Korean language processing, but it appears incremental as it applies existing HMM methods to a new language-specific dataset.
The paper tackled the problem of Korean lip-reading by defining 10 visemes based on vowel analysis and extracting a 20-dimensional visual feature vector combining static and dynamic features, achieving word recognition using a 3-viseme HMM with efficiency evaluation.
In this paper, we defined the viseme (visual speech element) and described about the method of extracting visual feature vector. We defined the 10 visemes based on vowel by analyzing of Korean utterance and proposed the method of extracting the 20-dimensional visual feature vector, combination of static features and dynamic features. Lastly, we took an experiment in recognizing words based on 3-viseme HMM and evaluated the efficiency.