ReCoSplat: Autoregressive Feed-Forward Gaussian Splatting Using Render-and-Compare
This addresses the problem of robust scene reconstruction for applications like robotics or AR/VR, though it appears incremental as it builds on existing Gaussian Splatting methods.
The paper tackles the challenge of online novel view synthesis from sequential, often unposed, observations by introducing ReCoSplat, an autoregressive feed-forward Gaussian Splatting model that achieves state-of-the-art performance across different input settings on in- and out-of-distribution benchmarks.
Online novel view synthesis remains challenging, requiring robust scene reconstruction from sequential, often unposed, observations. We present ReCoSplat, an autoregressive feed-forward Gaussian Splatting model supporting posed or unposed inputs, with or without camera intrinsics. While assembling local Gaussians using camera poses scales better than canonical-space prediction, it creates a dilemma during training: using ground-truth poses ensures stability but causes a distribution mismatch when predicted poses are used at inference. To address this, we introduce a Render-and-Compare (ReCo) module. ReCo renders the current reconstruction from the predicted viewpoint and compares it with the incoming observation, providing a stable conditioning signal that compensates for pose errors. To support long sequences, we propose a hybrid KV cache compression strategy combining early-layer truncation with chunk-level selective retention, reducing the KV cache size by over 90% for 100+ frames. ReCoSplat achieves state-of-the-art performance across different input settings on both in- and out-of-distribution benchmarks. Code and pretrained models will be released. Our project page is at https://freemancheng.com/ReCoSplat .