CVDec 9, 2025

ConceptPose: Training-Free Zero-Shot Object Pose Estimation using Concept Vectors

arXiv:2512.09056v1h-index: 30
Originality Highly original
AI Analysis

This addresses the need for efficient and adaptable pose estimation in computer vision and robotics, offering a novel zero-shot approach that is not incremental.

The paper tackles the problem of object pose estimation without requiring dataset-specific training by introducing ConceptPose, a training-free and model-free framework that uses vision-language models to create 3D concept maps and achieves state-of-the-art results, outperforming existing methods by over 62% in ADD(-S) score on zero-shot benchmarks.

Object pose estimation is a fundamental task in computer vision and robotics, yet most methods require extensive, dataset-specific training. Concurrently, large-scale vision language models show remarkable zero-shot capabilities. In this work, we bridge these two worlds by introducing ConceptPose, a framework for object pose estimation that is both training-free and model-free. ConceptPose leverages a vision-language-model (VLM) to create open-vocabulary 3D concept maps, where each point is tagged with a concept vector derived from saliency maps. By establishing robust 3D-3D correspondences across concept maps, our approach allows precise estimation of 6DoF relative pose. Without any object or dataset-specific training, our approach achieves state-of-the-art results on common zero shot relative pose estimation benchmarks, significantly outperforming existing methods by over 62% in ADD(-S) score, including those that utilize extensive dataset-specific training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes