SD CL MM ASSep 29, 2024

Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective

Chen Chen, Xiaolou Li, Zehua Liu, Lantian Li, Dong Wang

arXiv:2409.19575v12.72 citationsh-index: 19

Originality Synthesis-oriented

AI Analysis

This work addresses a theoretical gap for researchers in audio-visual speech processing, but it is incremental as it builds on existing tasks without introducing new methods.

The paper tackles the lack of theoretical analysis in audio-visual speech processing by providing a quantitative information-theoretic analysis, showing it helps understand task difficulties and benefits of modality integration.

In the field of spoken language processing, audio-visual speech processing is receiving increasing research attention. Key components of this research include tasks such as lip reading, audio-visual speech recognition, and visual-to-speech synthesis. Although significant success has been achieved, theoretical analysis is still insufficient for audio-visual tasks. This paper presents a quantitative analysis based on information theory, focusing on information intersection between different modalities. Our results show that this analysis is valuable for understanding the difficulties of audio-visual processing tasks as well as the benefits that could be obtained by modality integration.

View on arXiv PDF

Similar