ASAICVLGSDJan 12, 2024

Dynamic Behaviour of Connectionist Speech Recognition with Strong Latency Constraints

arXiv:2401.06588v114 citationsh-index: 19Speech Communication
Originality Synthesis-oriented
AI Analysis

This work addresses real-time speech-to-lip synchronization for synthetic faces, but it appears incremental as it focuses on analyzing interactions within existing connectionist and Viterbi decoder frameworks.

The paper tackled the problem of phonetic speech recognition under strong latency constraints for real-time lip synchronization in synthetic faces, finding a strong interaction between neural network topology, language model time dependencies, and decoder latency.

This paper describes the use of connectionist techniques in phonetic speech recognition with strong latency constraints. The constraints are imposed by the task of deriving the lip movements of a synthetic face in real time from the speech signal, by feeding the phonetic string into an articulatory synthesiser. Particular attention has been paid to analysing the interaction between the time evolution model learnt by the multi-layer perceptrons and the transition model imposed by the Viterbi decoder, in different latency conditions. Two experiments were conducted in which the time dependencies in the language model (LM) were controlled by a parameter. The results show a strong interaction between the three factors involved, namely the neural network topology, the length of time dependencies in the LM and the decoder latency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes