CVROSep 29, 2025

SCOPE: Semantic Conditioning for Sim2Real Category-Level Object Pose Estimation in Robotics

arXiv:2509.24572v1h-index: 7Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of generalizing object pose estimation to unknown categories in robotics, with incremental improvements in Sim2Real transfer.

The paper tackles the problem of category-level object pose estimation for robots in open environments by introducing SCOPE, a diffusion-based model that uses DINOv2 features as continuous semantic priors, achieving a 31.9% relative improvement on the 5°5cm metric and up to 100% success in grasping unseen objects.

Object manipulation requires accurate object pose estimation. In open environments, robots encounter unknown objects, which requires semantic understanding in order to generalize both to known categories and beyond. To resolve this challenge, we present SCOPE, a diffusion-based category-level object pose estimation model that eliminates the need for discrete category labels by leveraging DINOv2 features as continuous semantic priors. By combining these DINOv2 features with photorealistic training data and a noise model for point normals, we reduce the Sim2Real gap in category-level object pose estimation. Furthermore, injecting the continuous semantic priors via cross-attention enables SCOPE to learn canonicalized object coordinate systems across object instances beyond the distribution of known categories. SCOPE outperforms the current state of the art in synthetically trained category-level object pose estimation, achieving a relative improvement of 31.9\% on the 5$^\circ$5cm metric. Additional experiments on two instance-level datasets demonstrate generalization beyond known object categories, enabling grasping of unseen objects from unknown categories with a success rate of up to 100\%. Code available: https://github.com/hoenigpeter/scope.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes