CVGRFeb 28, 2024

NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images

arXiv:2402.18196v21 citationsh-index: 11ECCV Workshops
AI Analysis

This addresses a data scarcity problem for researchers and practitioners in computer vision working with top-view fisheye cameras, but it is incremental as it builds on existing datasets and methods.

The paper tackles the lack of datasets for human pose estimation in top-view fisheye images by generating a new dataset using Neural Radiance Fields (NeRF), resulting in a 33.3% AP improvement for 2D pose estimation and a 53.7 mm reduction in PA-MPJPE for 3D pose estimation after finetuning.

Human pose estimation (HPE) in the top-view using fisheye cameras presents a promising and innovative application domain. However, the availability of datasets capturing this viewpoint is extremely limited, especially those with high-quality 2D and 3D keypoint annotations. Addressing this gap, we leverage the capabilities of Neural Radiance Fields (NeRF) technique to establish a comprehensive pipeline for generating human pose datasets from existing 2D and 3D datasets, specifically tailored for the top-view fisheye perspective. Through this pipeline, we create a novel dataset NToP570K (NeRF-powered Top-view human Pose dataset for fisheye cameras with over 570 thousand images), and conduct an extensive evaluation of its efficacy in enhancing neural networks for 2D and 3D top-view human pose estimation. A pretrained ViTPose-B model achieves an improvement in AP of 33.3 % on our validation set for 2D HPE after finetuning on our training set. A similarly finetuned HybrIK-Transformer model gains 53.7 mm reduction in PA-MPJPE for 3D HPE on the validation set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes