Flexible graph convolutional network for 3D human pose estimation
This work addresses occlusion and depth ambiguity issues in 3D human pose estimation, but it is incremental as it builds on existing graph convolutional methods.
The paper tackled the limitation of graph convolutional networks in capturing high-order dependencies for 3D human pose estimation by introducing Flex-GCN, which aggregates features from immediate and second-order neighbors, achieving competitive performance on benchmark datasets.
Although graph convolutional networks exhibit promising performance in 3D human pose estimation, their reliance on one-hop neighbors limits their ability to capture high-order dependencies among body joints, crucial for mitigating uncertainty arising from occlusion or depth ambiguity. To tackle this limitation, we introduce Flex-GCN, a flexible graph convolutional network designed to learn graph representations that capture broader global information and dependencies. At its core is the flexible graph convolution, which aggregates features from both immediate and second-order neighbors of each node, while maintaining the same time and memory complexity as the standard convolution. Our network architecture comprises residual blocks of flexible graph convolutional layers, as well as a global response normalization layer for global feature aggregation, normalization and calibration. Quantitative and qualitative results demonstrate the effectiveness of our model, achieving competitive performance on benchmark datasets.