Hierarchical Action Classification with Network Pruning
This work addresses action classification for computer vision applications, but it is incremental as it builds on existing deep learning methods with auxiliary mechanisms.
The paper tackles human action classification by proposing a method that combines hierarchical classification, network pruning, and skeleton-based preprocessing to improve model robustness and performance, achieving comparable or better results on four datasets, including setting a new baseline for NTU 120.
Research on human action classification has made significant progresses in the past few years. Most deep learning methods focus on improving performance by adding more network components. We propose, however, to better utilize auxiliary mechanisms, including hierarchical classification, network pruning, and skeleton-based preprocessing, to boost the model robustness and performance. We test the effectiveness of our method on four commonly used testing datasets: NTU RGB+D 60, NTU RGB+D 120, Northwestern-UCLA Multiview Action 3D, and UTD Multimodal Human Action Dataset. Our experiments show that our method can achieve either comparable or better performance on all four datasets. In particular, our method sets up a new baseline for NTU 120, the largest dataset among the four. We also analyze our method with extensive comparisons and ablation studies.