CVApr 10

Adding Another Dimension to Image-based Animal Detection

Vandita Shukla, Fabio Remondino, Benjamin Risse

arXiv:2604.092108.7h-index: 2

AI Analysis

This work addresses a domain-specific problem for researchers in computer vision and animal monitoring, but it is incremental as it builds on existing models and datasets.

The paper tackles the problem of lacking 3D detection methods for RGB animal images due to a shortage of labeled datasets, by presenting a pipeline that estimates 3D bounding boxes and projects them into 2D image space, demonstrating accurate performance on the Animal3D dataset.

Monocular imaging of animals inherently reduces 3D structures to 2D projections. Detection algorithms lead to 2D bounding boxes that lack information about animal's orientation relative to the camera. To build 3D detection methods for RGB animal images, there is a lack of labeled datasets; such labeling processes require 3D input streams along with RGB data. We present a pipeline that utilises Skinned Multi Animal Linear models to estimate 3D bounding boxes and to project them as robust labels into 2D image space using a dedicated camera pose refinement algorithm. To assess which sides of the animal are captured, cuboid face visibility metrics are computed. These 3D bounding boxes and metrics form a crucial step toward developing and benchmarking future monocular 3D animal detection algorithms. We evaluate our method on the Animal3D dataset, demonstrating accurate performance across species and settings.

View on arXiv PDF

Similar