Face Recognition with Machine Learning in OpenCV_ Fusion of the results with the Localization Data of an Acoustic Camera for Speaker Identification
This work addresses speaker identification for real-time audio-visual systems, but it appears incremental as it fuses existing methods without introducing new algorithms.
The paper tackles the problem of speaker identification by combining face recognition using OpenCV with acoustic camera localization data to determine who is speaking in real-time, resulting in a precise situational description for applications like multi-channel speech enhancement.
This contribution gives an overview of face recogni-tion algorithms, their implementation and practical uses. First, a training set of different persons' faces has to be collected and used to train a face recognizer. The resulting face model can be utilized to classify people in specific individuals or unknowns. After tracking the recognized face and estimating the acoustic sound source's position, both can be combined to give detailed information about possible speakers and if they are talking or not. This leads to a precise real-time description of the situation, which can be used for further applications, e.g. for multi-channel speech enhancement by adaptive beamformers.