DNF: Unconditional 4D Generation with Dictionary-based Neural Fields
This addresses the problem of generating realistic 4D animations for applications in graphics and simulation, representing an incremental advance by combining dictionary learning with diffusion models.
The paper tackles the challenge of 4D generative modeling for deformable shapes over time by proposing DNF, a dictionary-based neural field representation that disentangles shape and motion, resulting in high-fidelity 4D animations with efficient modeling of details and deformations.
While remarkable success has been achieved through diffusion-based 3D generative models for shapes, 4D generative modeling remains challenging due to the complexity of object deformations over time. We propose DNF, a new 4D representation for unconditional generative modeling that efficiently models deformable shapes with disentangled shape and motion while capturing high-fidelity details in the deforming objects. To achieve this, we propose a dictionary learning approach to disentangle 4D motion from shape as neural fields. Both shape and motion are represented as learned latent spaces, where each deformable shape is represented by its shape and motion global latent codes, shape-specific coefficient vectors, and shared dictionary information. This captures both shape-specific detail and global shared information in the learned dictionary. Our dictionary-based representation well balances fidelity, contiguity and compression -- combined with a transformer-based diffusion model, our method is able to generate effective, high-fidelity 4D animations.