CVAICLLGFeb 7, 2025

Lightweight Operations for Visual Speech Recognition

arXiv:2502.04834v11 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses the problem of deploying VSR on resource-constrained devices, but it appears incremental as it focuses on optimizing existing methods rather than introducing a new paradigm.

The paper tackles the high computational cost of visual speech recognition (VSR) by developing lightweight architectures, achieving reduced resource requirements with minimal accuracy loss on a large-scale dataset.

Visual speech recognition (VSR), which decodes spoken words from video data, offers significant benefits, particularly when audio is unavailable. However, the high dimensionality of video data leads to prohibitive computational costs that demand powerful hardware, limiting VSR deployment on resource-constrained devices. This work addresses this limitation by developing lightweight VSR architectures. Leveraging efficient operation design paradigms, we create compact yet powerful models with reduced resource requirements and minimal accuracy loss. We train and evaluate our models on a large-scale public dataset for recognition of words from video sequences, demonstrating their effectiveness for practical applications. We also conduct an extensive array of ablative experiments to thoroughly analyze the size and complexity of each model. Code and trained models will be made publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes