CVJul 19, 2022Code
PoserNet: Refining Relative Camera Poses Exploiting Object DetectionsMatteo Taiana, Matteo Toso, Stuart James et al.
The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pair-wise relative camera poses. PoserNet exploits associations between the objectness regions - concisely expressed as bounding boxes - across multiple views to globally refine sparsely connected view graphs. We evaluate on the 7-Scenes dataset across varied sizes of graphs and show how this process can be beneficial to optimisation-based Motion Averaging algorithms improving the median error on the rotation by 62 degrees with respect to the initial estimates obtained based on bounding boxes. Code and data are available at https://github.com/IIT-PAVIS/PoserNet.
CVApr 13, 2023Code
3DoF Localization from a Single Image and an Object Map: the Flatlandia Problem and DatasetMatteo Toso, Matteo Taiana, Stuart James et al.
Efficient visual localization is crucial to many applications, such as large-scale deployment of autonomous agents and augmented reality. Traditional visual localization, while achieving remarkable accuracy, relies on extensive 3D models of the scene or large collections of geolocalized images, which are often inefficient to store and to scale to novel environments. In contrast, humans orient themselves using very abstract 2D maps, using the location of clearly identifiable landmarks. Drawing on this and on the success of recent works that explored localization on 2D abstract maps, we propose Flatlandia, a novel visual localization challenge. With Flatlandia, we investigate whether it is possible to localize a visual query by comparing the layout of its common objects detected against the known spatial layout of objects in the map. We formalize the challenge as two tasks at different levels of accuracy to investigate the problem and its possible limitations; for each, we propose initial baseline models and compare them against state-of-the-art 6DoF and 3DoF methods. Code and dataset are publicly available at github.com/IIT-PAVIS/Flatlandia.
ROJan 2, 2023
3DSGrasp: 3D Shape-Completion for Robotic GraspSeyed S. Mohammadi, Nuno F. Duarte, Dimitris Dimou et al.
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
CVMar 13, 2024
PRAGO: Differentiable Multi-View Pose Optimization From Objectness DetectionsMatteo Taiana, Matteo Toso, Stuart James et al.
Robustly estimating camera poses from a set of images is a fundamental task which remains challenging for differentiable methods, especially in the case of small and sparse camera pose graphs. To overcome this challenge, we propose Pose-refined Rotation Averaging Graph Optimization (PRAGO). From a set of objectness detections on unordered images, our method reconstructs the rotational pose, and in turn, the absolute pose, in a differentiable manner benefiting from the optimization of a sequence of geometrical tasks. We show how our objectness pose-refinement module in PRAGO is able to refine the inherent ambiguities in pairwise relative pose estimation without removing edges and avoiding making early decisions on the viability of graph edges. PRAGO then refines the absolute rotations through iterative graph construction, reweighting the graph edges to compute the final rotational pose, which can be converted into absolute poses using translation averaging. We show that PRAGO is able to outperform non-differentiable solvers on small and sparse scenes extracted from 7-Scenes achieving a relative improvement of 21% for rotations while achieving similar translation estimates.