UniQueR: Unified Query-based Feedforward 3D Reconstruction
This addresses the challenge of 3D reconstruction for computer vision applications by offering a more efficient and accurate feedforward method, though it appears incremental as it builds on existing query-based and feedforward approaches.
The paper tackles the problem of efficient and accurate 3D reconstruction from unposed images by introducing UniQueR, a unified query-based feedforward framework that uses sparse 3D queries to infer scene structure, including occluded regions, in a single forward pass. It achieves state-of-the-art rendering quality and geometric accuracy on benchmarks like Mip-NeRF 360 and VR-NeRF, using an order of magnitude fewer primitives than dense alternatives.
We present UniQueR, a unified query-based feedforward framework for efficient and accurate 3D reconstruction from unposed images. Existing feedforward models such as DUSt3R, VGGT, and AnySplat typically predict per-pixel point maps or pixel-aligned Gaussians, which remain fundamentally 2.5D and limited to visible surfaces. In contrast, UniQueR formulates reconstruction as a sparse 3D query inference problem. Our model learns a compact set of 3D anchor points that act as explicit geometric queries, enabling the network to infer scene structure, including geometry in occluded regions--in a single forward pass. Each query encodes spatial and appearance priors directly in global 3D space (instead of per-frame camera space) and spawns a set of 3D Gaussians for differentiable rendering. By leveraging unified query interactions across multi-view features and a decoupled cross-attention design, UniQueR achieves strong geometric expressiveness while substantially reducing memory and computational cost. Experiments on Mip-NeRF 360 and VR-NeRF demonstrate that UniQueR surpasses state-of-the-art feedforward methods in both rendering quality and geometric accuracy, using an order of magnitude fewer primitives than dense alternatives.