Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images
This work addresses artifacts and memory inefficiencies in feed-forward methods for 3D Gaussian Splatting, offering a domain-specific improvement for novel view synthesis in computer vision.
The paper tackles the problem of generating efficient and generalizable Gaussian representations from multi-view images for novel view synthesis, proposing a Gaussian Graph Network that uses fewer Gaussians to achieve better image quality and higher rendering speed compared to state-of-the-art methods.
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis performance. While conventional methods require per-scene optimization, more recently several feed-forward methods have been proposed to generate pixel-aligned Gaussian representations with a learnable network, which are generalizable to different scenes. However, these methods simply combine pixel-aligned Gaussians from multiple views as scene representations, thereby leading to artifacts and extra memory cost without fully capturing the relations of Gaussians from different images. In this paper, we propose Gaussian Graph Network (GGN) to generate efficient and generalizable Gaussian representations. Specifically, we construct Gaussian Graphs to model the relations of Gaussian groups from different views. To support message passing at Gaussian level, we reformulate the basic graph operations over Gaussian representations, enabling each Gaussian to benefit from its connected Gaussian groups with Gaussian feature fusion. Furthermore, we design a Gaussian pooling layer to aggregate various Gaussian groups for efficient representations. We conduct experiments on the large-scale RealEstate10K and ACID datasets to demonstrate the efficiency and generalization of our method. Compared to the state-of-the-art methods, our model uses fewer Gaussians and achieves better image quality with higher rendering speed.