ROCVLGJul 1, 2025

Box Pose and Shape Estimation and Domain Adaptation for Large-Scale Warehouse Automation

arXiv:2507.00984v11 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving perception models for warehouse robots without manual annotations, which is incremental as it builds on existing domain adaptation methods.

The paper tackles the problem of estimating box pose and shape in warehouse automation by developing a self-supervised domain adaptation pipeline that uses unlabeled real-world data, resulting in significant performance improvements over simulation-only and zero-shot baselines.

Modern warehouse automation systems rely on fleets of intelligent robots that generate vast amounts of data -- most of which remains unannotated. This paper develops a self-supervised domain adaptation pipeline that leverages real-world, unlabeled data to improve perception models without requiring manual annotations. Our work focuses specifically on estimating the pose and shape of boxes and presents a correct-and-certify pipeline for self-supervised box pose and shape estimation. We extensively evaluate our approach across a range of simulated and real industrial settings, including adaptation to a large-scale real-world dataset of 50,000 images. The self-supervised model significantly outperforms models trained solely in simulation and shows substantial improvements over a zero-shot 3D bounding box estimation baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes