CVAug 18, 2024

Joint Temporal Pooling for Improving Skeleton-based Action Recognition

arXiv:2408.09356v13 citationsh-index: 38
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in action recognition for computer vision applications, but it appears incremental as it builds on existing temporal pooling techniques.

The paper tackles the problem of preserving motion information in skeleton-based action recognition by proposing a Joint Motion Adaptive Temporal Pooling (JMAP) method, which improves performance on NTU RGB+D 120 and PKU-MMD datasets.

In skeleton-based human action recognition, temporal pooling is a critical step for capturing spatiotemporal relationship of joint dynamics. Conventional pooling methods overlook the preservation of motion information and treat each frame equally. However, in an action sequence, only a few segments of frames carry discriminative information related to the action. This paper presents a novel Joint Motion Adaptive Temporal Pooling (JMAP) method for improving skeleton-based action recognition. Two variants of JMAP, frame-wise pooling and joint-wise pooling, are introduced. The efficacy of JMAP has been validated through experiments on the popular NTU RGB+D 120 and PKU-MMD datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes