CVMay 22, 2025

PhyMAGIC: Physical Motion-Aware Generative Inference with Confidence-guided LLM

arXiv:2505.16456v21 citationsh-index: 3

Originality Highly original

AI Analysis

This addresses the need for scalable and applicable dynamic 3D content generation without fine-tuning, benefiting fields like animation and simulation.

The paper tackles the problem of generating physically consistent 3D motion from a single image, presenting PhyMAGIC, a training-free framework that integrates diffusion models, LLMs, and physics simulators to outperform state-of-the-art methods in physical property inference and motion-text alignment.

Recent advances in 3D content generation have amplified demand for dynamic models that are both visually realistic and physically consistent. However, state-of-the-art video diffusion models frequently produce implausible results such as momentum violations and object interpenetrations. Existing physics-aware approaches often rely on task-specific fine-tuning or supervised data, which limits their scalability and applicability. To address the challenge, we present PhyMAGIC, a training-free framework that generates physically consistent motion from a single image. PhyMAGIC integrates a pre-trained image-to-video diffusion model, confidence-guided reasoning via LLMs, and a differentiable physics simulator to produce 3D assets ready for downstream physical simulation without fine-tuning or manual supervision. By iteratively refining motion prompts using LLM-derived confidence scores and leveraging simulation feedback, PhyMAGIC steers generation toward physically consistent dynamics. Comprehensive experiments demonstrate that PhyMAGIC outperforms state-of-the-art video generators and physics-aware baselines, enhancing physical property inference and motion-text alignment while maintaining visual fidelity.

View on arXiv PDF

Similar