X-GRM: Large Gaussian Reconstruction Model for Sparse-view X-rays to Computed Tomography
This work addresses a domain-specific challenge in medical imaging by improving CT reconstruction from sparse data, which is incremental as it builds on existing methods with new architectural and representation components.
The paper tackles the problem of reconstructing 3D CT volumes from sparse-view 2D X-ray projections by introducing X-GRM, a large feedforward model with a novel Voxel-based Gaussian Splatting representation, resulting in high-quality reconstructions for both in-domain and out-domain inputs.
Computed Tomography serves as an indispensable tool in clinical workflows, providing non-invasive visualization of internal anatomical structures. Existing CT reconstruction works are limited to small-capacity model architecture and inflexible volume representation. In this work, we present X-GRM (X-ray Gaussian Reconstruction Model), a large feedforward model for reconstructing 3D CT volumes from sparse-view 2D X-ray projections. X-GRM employs a scalable transformer-based architecture to encode sparse-view X-ray inputs, where tokens from different views are integrated efficiently. Then, these tokens are decoded into a novel volume representation, named Voxel-based Gaussian Splatting (VoxGS), which enables efficient CT volume extraction and differentiable X-ray rendering. This combination of a high-capacity model and flexible volume representation, empowers our model to produce high-quality reconstructions from various testing inputs, including in-domain and out-domain X-ray projections. Our codes are available at: https://github.com/CUHK-AIM-Group/X-GRM.