Dániel Baráth

CV
h-index9
4papers
248citations
Novelty63%
AI Score53

4 Papers

CVJul 29, 2024Code
Global Structure-from-Motion Revisited

Linfei Pan, Dániel Baráth, Marc Pollefeys et al.

Recovering 3D structure and camera motion from images has been a long-standing focus of computer vision research and is known as Structure-from-Motion (SfM). Solutions to this problem are categorized into incremental and global approaches. Until now, the most popular systems follow the incremental paradigm due to its superior accuracy and robustness, while global approaches are drastically more scalable and efficient. With this work, we revisit the problem of global SfM and propose GLOMAP as a new general-purpose system that outperforms the state of the art in global SfM. In terms of accuracy and robustness, we achieve results on-par or superior to COLMAP, the most widely used incremental SfM, while being orders of magnitude faster. We share our system as an open-source implementation at {https://github.com/colmap/glomap}.

CVJun 3
ZipSplat: Fewer Gaussians, Better Splats

Alexander Veicht, Sunghwan Hong, Dániel Baráth et al.

Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forward pass, yet current approaches predict one Gaussian per input pixel, tying the representation budget to camera resolution rather than scene complexity. A flat wall and a richly textured object thus produce equally many Gaussians despite very different geometric needs. We propose ZipSplat, a token-based feed-forward model that decouples Gaussian placement from the pixel grid. A multi-view backbone extracts dense visual tokens, and k-means clustering compresses them into a compact set of scene tokens. Cross- and self-attention refine these tokens, and a lightweight MLP decodes each into a group of Gaussians with unconstrained 3D positions. Because clustering is applied at inference, a single trained model spans the quality-efficiency curve without retraining. ZipSplat operates without ground-truth poses or intrinsics, yet sets a new state of the art on DL3DV and RealEstate10K with ${\sim}6{\times}$ fewer Gaussians than pixel-aligned methods, surpassing the best pose-free baseline by 2.1dB and 1.2dB PSNR, respectively. It further generalizes zero-shot to Mip-NeRF360 and ScanNet++, outperforming all comparable baselines. Our project page is at ${\href{https://veichta.com/zipsplat}{https://veichta.com/zipsplat}}$.

CVOct 16, 2024Code
Gravity-aligned Rotation Averaging with Circular Regression

Linfei Pan, Marc Pollefeys, Dániel Baráth

Reconstructing a 3D scene from unordered images is pivotal in computer vision and robotics, with applications spanning crowd-sourced mapping and beyond. While global Structure-from-Motion (SfM) techniques are scalable and fast, they often compromise on accuracy. To address this, we introduce a principled approach that integrates gravity direction into the rotation averaging phase of global pipelines, enhancing camera orientation accuracy and reducing the degrees of freedom. This additional information is commonly available in recent consumer devices, such as smartphones, mixed-reality devices and drones, making the proposed method readily accessible. Rooted in circular regression, our algorithm has similar convergence guarantees as linear regression. It also supports scenarios where only a subset of cameras have known gravity. Additionally, we propose a mechanism to refine error-prone gravity. We achieve state-of-the-art accuracy on four large-scale datasets. Particularly, the proposed method improves upon the SfM baseline by 13 AUC@$1^\circ$ points, on average, while running eight times faster. It also outperforms the standard planar pose graph optimization technique by 23 AUC@$1^\circ$ points. The code is at https://github.com/colmap/glomap.

CVNov 17, 2020Code
P1AC: Revisiting Absolute Pose From a Single Affine Correspondence

Jonathan Ventura, Zuzana Kukelova, Torsten Sattler et al.

Affine correspondences have traditionally been used to improve feature matching over wide baselines. While recent work has successfully used affine correspondences to solve various relative camera pose estimation problems, less attention has been given to their use in absolute pose estimation. We introduce the first general solution to the problem of estimating the pose of a calibrated camera given a single observation of an oriented point and an affine correspondence. The advantage of our approach (P1AC) is that it requires only a single correspondence, in comparison to the traditional point-based approach (P3P), significantly reducing the combinatorics in robust estimation. P1AC provides a general solution that removes restrictive assumptions made in prior work and is applicable to large-scale image-based localization. We propose a minimal solution to the P1AC problem and evaluate our novel solver on synthetic data, showing its numerical stability and performance under various types of noise. On standard image-based localization benchmarks we show that P1AC achieves more accurate results than the widely used P3P algorithm. Code for our method is available at https://github.com/jonathanventura/P1AC/ .