85.2CVMay 31
Splatshot: 3D Face Avatar Generation from a Single Unconstrained PhotoHao Liang, Zhixuan Ge, Soumendu Majee et al.
Reconstructing a photorealistic 3D face avatar from a single unconstrained photograph is challenging: feed-forward 3D Gaussian Splatting (3DGS) models degrade on out-of-distribution inputs, while pretrained diffusion models produce high-fidelity images but lack multi-view consistency. We observe that these paradigms are fundamentally complementary: explicit 3D representations guarantee geometric consistency, whereas 2D diffusion priors ensure photorealism. Building on this, we propose SplatShot, a training-free framework that couples these representations directly within the denoising process. Given a base 3DGS face model and a single reference image, we jointly denoise all target views using a per-step 3D feedback loop. At each timestep, we predict clean images from the noisy latents, refit the 3DGS to these multi-view predictions, and back-propagate the photometric discrepancy between the 3DGS re-renderings and 2D predictions into the noise estimate. This steers the sampling trajectory toward strictly 3D-coherent, identity-faithful outputs. Experiments on diverse in-the-wild images demonstrate that SplatShot produces 3D avatars with superior identity preservation, photorealism, and multi-view consistency.
59.3IVApr 3
DRIFT: Deep Restoration, ISP Fusion, and Tone-mappingSoumendu Majee, Joshua Peter Ebenezer, Abhinau K. Venkataramanan et al.
Smartphone cameras have gained immense popularity with the adoption of high-resolution and high-dynamic range imaging. As a result, high-performance camera Image Signal Processors (ISPs) are crucial in generating high-quality images for the end user while keeping computational costs low. In this paper, we propose DRIFT (Deep Restoration, ISP Fusion, and Tone-mapping): an efficient AI mobile camera pipeline that generates high quality RGB images from hand-held raw captures. The first stage of DRIFT is a Multi-Frame Processing (MFP) network that is trained using a adversarial perceptual loss to perform multi-frame alignment, denoising, demosaicing, and super-resolution. Then, the output of DRIFT-MFP is processed by a novel deep-learning based tone-mapping (DRIFT-TM) solution that allows for tone tunability, ensures tone-consistency with a reference pipeline, and can be run efficiently for high-resolution images on a mobile device. We show qualitative and quantitative comparisons against state-of-the-art MFP and tone-mapping methods to demonstrate the effectiveness of our approach.
CVAug 25, 2025
FastAvatar: Instant 3D Gaussian Splatting for Faces from Single Unconstrained PosesHao Liang, Zhixuan Ge, Ashish Tiwari et al.
We present FastAvatar, a pose-invariant, feed-forward framework that can generate a 3D Gaussian Splatting (3DGS) model from a single face image from an arbitrary pose in near-instant time (<10ms). FastAvatar uses a novel encoder-decoder neural network design to achieve both fast fitting and identity preservation regardless of input pose. First, FastAvatar constructs a 3DGS face ``template'' model from a training dataset of faces with multi-view captures. Second, FastAvatar encodes the input face image into an identity-specific and pose-invariant latent embedding, and decodes this embedding to predict residuals to the structural and appearance parameters of each Gaussian in the template 3DGS model. By only inferring residuals in a feed-forward fashion, model inference is fast and robust. FastAvatar significantly outperforms existing feed-forward face 3DGS methods (e.g., GAGAvatar) in reconstruction quality, and runs 1000x faster than per-face optimization methods (e.g., FlashAvatar, GaussianAvatars and GASP). In addition, FastAvatar's novel latent space design supports real-time identity interpolation and attribute editing which is not possible with any existing feed-forward 3DGS face generation framework. FastAvatar's combination of excellent reconstruction quality and speed expands the scope of 3DGS for photorealistic avatar applications in consumer and interactive systems.
IVNov 11, 2021
CodEx: A Modular Framework for Joint Temporal De-blurring and Tomographic ReconstructionSoumendu Majee, Selin Aslan, Doga Gursoy et al.
In many computed tomography (CT) imaging applications, it is important to rapidly collect data from an object that is moving or changing with time. Tomographic acquisition is generally assumed to be step-and-shoot, where the object is rotated to each desired angle, and a view is taken. However, step-and-shoot acquisition is slow and can waste photons, so in practice fly-scanning is done where the object is continuously rotated while collecting data. However, this can result in motion-blurred views and consequently reconstructions with severe motion artifacts. In this paper, we introduce CodEx, a modular framework for joint de-blurring and tomographic reconstruction that can effectively invert the motion blur introduced in sparse view fly-scanning. The method is a synergistic combination of a novel acquisition method with a novel non-convex Bayesian reconstruction algorithm. CodEx works by encoding the acquisition with a known binary code that the reconstruction algorithm then inverts. Using a well chosen binary code to encode the measurements can improve the accuracy of the inversion process. The CodEx reconstruction method uses the alternating direction method of multipliers (ADMM) to split the inverse problem into iterative deblurring and reconstruction sub-problems, making reconstruction practical to implement. We present reconstruction results on both simulated and binned experimental data to demonstrate the effectiveness of our method.
IVAug 1, 2020
Multi-Slice Fusion for Sparse-View and Limited-Angle 4D CT ReconstructionSoumendu Majee, Thilo Balke, Craig A. J. Kemp et al.
Inverse problems spanning four or more dimensions such as space, time and other independent parameters have become increasingly important. State-of-the-art 4D reconstruction methods use model based iterative reconstruction (MBIR), but depend critically on the quality of the prior modeling. Recently, plug-and-play (PnP) methods have been shown to be an effective way to incorporate advanced prior models using state-of-the-art denoising algorithms. However, state-of-the-art denoisers such as BM4D and deep convolutional neural networks (CNNs) are primarily available for 2D or 3D images and extending them to higher dimensions is difficult due to algorithmic complexity and the increased difficulty of effective training. In this paper, we present multi-slice fusion, a novel algorithm for 4D reconstruction, based on the fusion of multiple low-dimensional denoisers. Our approach uses multi-agent consensus equilibrium (MACE), an extension of plug-and-play, as a framework for integrating the multiple lower-dimensional models. We apply our method to 4D cone-beam X-ray CT reconstruction for non destructive evaluation (NDE) of samples that are dynamically moving during acquisition. We implement multi-slice fusion on distributed, heterogeneous clusters in order to reconstruct large 4D volumes in reasonable time and demonstrate the inherent parallelizable nature of the algorithm. We present simulated and real experimental results on sparse-view and limited-angle CT data to demonstrate that multi-slice fusion can substantially improve the quality of reconstructions relative to traditional methods, while also being practical to implement and train.
IVJun 15, 2019
4D X-Ray CT Reconstruction using Multi-Slice FusionSoumendu Majee, Thilo Balke, Craig A. J. Kemp et al.
There is an increasing need to reconstruct objects in four or more dimensions corresponding to space, time and other independent parameters. The best 4D reconstruction algorithms use regularized iterative reconstruction approaches such as model based iterative reconstruction (MBIR), which depends critically on the quality of the prior modeling. Recently, Plug-and-Play methods have been shown to be an effective way to incorporate advanced prior models using state-of-the-art denoising algorithms designed to remove additive white Gaussian noise (AWGN). However, state-of-the-art denoising algorithms such as BM4D and deep convolutional neural networks (CNNs) are primarily available for 2D and sometimes 3D images. In particular, CNNs are difficult and computationally expensive to implement in four or more dimensions, and training may be impossible if there is no associated high-dimensional training data. In this paper, we present Multi-Slice Fusion, a novel algorithm for 4D and higher-dimensional reconstruction, based on the fusion of multiple low-dimensional denoisers. Our approach uses multi-agent consensus equilibrium (MACE), an extension of Plug-and-Play, as a framework for integrating the multiple lower-dimensional prior models. We apply our method to the problem of 4D cone-beam X-ray CT reconstruction for Non Destructive Evaluation (NDE) of moving parts. This is done by solving the MACE equations using lower-dimensional CNN denoisers implemented in parallel on a heterogeneous cluster. Results on experimental CT data demonstrate that Multi-Slice Fusion can substantially improve the quality of reconstructions relative to traditional 4D priors, while also being practical to implement and train.