CVAug 21, 2024

Interpretable Long-term Action Quality Assessment

arXiv:2408.11687v18 citationsh-index: 2Has Code
Originality Incremental advance
AI Analysis

This work addresses interpretability challenges in video-based action quality assessment for applications like sports or healthcare, representing an incremental improvement over existing methods.

The paper tackles the problem of interpretability in long-term Action Quality Assessment (AQA) by addressing issues like Temporal Skipping in transformer networks, proposing an attention loss, query initialization, and a weight-score regression module to improve performance and interpretability, achieving state-of-the-art results on three benchmarks.

Long-term Action Quality Assessment (AQA) evaluates the execution of activities in videos. However, the length presents challenges in fine-grained interpretability, with current AQA methods typically producing a single score by averaging clip features, lacking detailed semantic meanings of individual clips. Long-term videos pose additional difficulty due to the complexity and diversity of actions, exacerbating interpretability challenges. While query-based transformer networks offer promising long-term modeling capabilities, their interpretability in AQA remains unsatisfactory due to a phenomenon we term Temporal Skipping, where the model skips self-attention layers to prevent output degradation. To address this, we propose an attention loss function and a query initialization method to enhance performance and interpretability. Additionally, we introduce a weight-score regression module designed to approximate the scoring patterns observed in human judgments and replace conventional single-score regression, improving the rationality of interpretability. Our approach achieves state-of-the-art results on three real-world, long-term AQA benchmarks. Our code is available at: https://github.com/dx199771/Interpretability-AQA

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes