HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation
This addresses the need for markerless 3D pose estimation in mixed reality medical education, though it is incremental as it focuses on dataset creation.
The authors tackled the problem of hand-ultrasound probe pose estimation for obstetric ultrasound by creating HUP-3D, a synthetic dataset with over 31k frames, and validated it to achieve the lowest hand-object keypoint errors.
We present HUP-3D, a 3D multi-view multi-modal synthetic dataset for hand-ultrasound (US) probe pose estimation in the context of obstetric ultrasound. Egocentric markerless 3D joint pose estimation has potential applications in mixed reality based medical education. The ability to understand hand and probe movements programmatically opens the door to tailored guidance and mentoring applications. Our dataset consists of over 31k sets of RGB, depth and segmentation mask frames, including pose related ground truth data, with a strong emphasis on image diversity and complexity. Adopting a camera viewpoint-based sphere concept allows us to capture a variety of views and generate multiple hand grasp poses using a pre-trained network. Additionally, our approach includes a software-based image rendering concept, enhancing diversity with various hand and arm textures, lighting conditions, and background images. Furthermore, we validated our proposed dataset with state-of-the-art learning models and we obtained the lowest hand-object keypoint errors. The dataset and other details are provided with the supplementary material. The source code of our grasp generation and rendering pipeline will be made publicly available.