CVJul 1, 2024

CLHOP: Combined Audio-Video Learning for Horse 3D Pose and Shape Estimation

Ci Li, Elin Hernlund, Hedvig Kjellström, Silvia Zuffi

arXiv:2407.01244v13.74 citationsh-index: 30

Originality Incremental advance

AI Analysis

This work addresses 3D animal motion estimation for applications like veterinary science or animation, but it is incremental as it extends existing audio-visual methods to a new domain (horses).

The paper tackles the problem of predicting 3D pose and shape of horses from monocular video, which is under-constrained with visual data alone, by incorporating audio to enhance motion recovery. The results show that combining sound with visual data leads to more accurate and robust motion regression, tested on an indoor treadmill dataset and a new outdoor dataset of diverse horse movements.

In the monocular setting, predicting 3D pose and shape of animals typically relies solely on visual information, which is highly under-constrained. In this work, we explore using audio to enhance 3D shape and motion recovery of horses from monocular video. We test our approach on two datasets: an indoor treadmill dataset for 3D evaluation and an outdoor dataset capturing diverse horse movements, the latter being a contribution to this study. Our results show that incorporating sound with visual data leads to more accurate and robust motion regression. This study is the first to investigate audio's role in 3D animal motion recovery.

View on arXiv PDF

Similar