CVJul 31, 2023

DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation

Yue Zhang, Hehe Fan, Yi Yang, Mohan Kankanhalli

arXiv:2307.16803v13.92 citationsh-index: 70

Originality Synthesis-oriented

AI Analysis

This work addresses egocentric action segmentation for human-object interaction, but it is incremental as it combines existing methods rather than introducing a new paradigm.

The paper tackled 4D action segmentation by converting point cloud videos to depth videos and ensembling depth and point cloud methods, achieving first place in the HOI4D Challenge 2023 with significant accuracy improvement.

In this technical report, we present our findings from the research conducted on the Human-Object Interaction 4D (HOI4D) dataset for egocentric action segmentation task. As a relatively novel research area, point cloud video methods might not be good at temporal modeling, especially for long point cloud videos (\eg, 150 frames). In contrast, traditional video understanding methods have been well developed. Their effectiveness on temporal modeling has been widely verified on many large scale video datasets. Therefore, we convert point cloud videos into depth videos and employ traditional video modeling methods to improve 4D action segmentation. By ensembling depth and point cloud video methods, the accuracy is significantly improved. The proposed method, named Mixture of Depth and Point cloud video experts (DPMix), achieved the first place in the 4D Action Segmentation Track of the HOI4D Challenge 2023.

View on arXiv PDF

Similar