CL CV LGNov 15, 2014

Definition of Visual Speech Element and Research on a Method of Extracting Feature Vector for Korean Lip-Reading

Ha Jong Won, Li Gwang Chol, Kim Hyok Chol, Li Kum Song

arXiv:1411.4114v1

Originality Synthesis-oriented

AI Analysis

This work addresses lip-reading for Korean language processing, but it appears incremental as it applies existing HMM methods to a new language-specific dataset.

The paper tackled the problem of Korean lip-reading by defining 10 visemes based on vowel analysis and extracting a 20-dimensional visual feature vector combining static and dynamic features, achieving word recognition using a 3-viseme HMM with efficiency evaluation.

In this paper, we defined the viseme (visual speech element) and described about the method of extracting visual feature vector. We defined the 10 visemes based on vowel by analyzing of Korean utterance and proposed the method of extracting the 20-dimensional visual feature vector, combination of static features and dynamic features. Lastly, we took an experiment in recognizing words based on 3-viseme HMM and evaluated the efficiency.

View on arXiv PDF

Similar