BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall Representations
This addresses pose estimation in challenging crowded environments, which is important for applications like surveillance and sports analysis, though it appears incremental as it builds on existing bottom-up methods.
The paper tackles multi-person pose estimation in crowded scenes with occlusions by proposing BAPose, a bottom-up approach that achieves state-of-the-art results on COCO and CrowdPose datasets with significant accuracy improvements.
We propose BAPose, a novel bottom-up approach that achieves state-of-the-art results for multi-person pose estimation. Our end-to-end trainable framework leverages a disentangled multi-scale waterfall architecture and incorporates adaptive convolutions to infer keypoints more precisely in crowded scenes with occlusions. The multi-scale representations, obtained by the disentangled waterfall module in BAPose, leverage the efficiency of progressive filtering in the cascade architecture, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Our results on the challenging COCO and CrowdPose datasets demonstrate that BAPose is an efficient and robust framework for multi-person pose estimation, achieving significant improvements on state-of-the-art accuracy.