CLSep 29, 2025Code
Ultra-Fast Language Generation via Discrete Diffusion Divergence InstructHaoyang Zheng, Xinyang Liu, Cindy Xiangrui Kong et al.
Fast and high-quality language generation is the holy grail that people pursue in the age of AI. In this work, we introduce Discrete Diffusion Divergence Instruct (DiDi-Instruct), a training-based method that initializes from a pre-trained (masked) discrete diffusion language model (dLLM) and distills a few-step student for fast generation. The resulting DiDi-Instruct model achieves comparable or superior performance to its dLLM teacher and the GPT-2 baseline while enabling up to 64$\times$ acceleration. The theoretical foundation of DiDi-Instruct is a novel framework based on integral KL-divergence minimization, which yields a practical training algorithm. We further introduce grouped reward normalization, intermediate-state matching, and the reward-guided ancestral sampler that significantly improve training stability, model coverage, and inference quality. On OpenWebText, DiDi-Instruct achieves perplexity from 62.2 (8 NFEs) to 18.4 (128 NFEs), which outperforms prior accelerated dLLMs and GPT-2 baseline. These gains come with a negligible entropy loss (around $1\%$) and reduce additional training wall-clock time by more than $20\times$ compared to competing dLLM distillation methods. We further validate the robustness and effectiveness of DiDi-Instruct through extensive ablation studies, model scaling, and the generation of discrete protein sequences. In conclusion, DiDi-Instruct is an efficient yet effective distillation method, enabling language generation in the blink of an eye. We will release both code and models at github.com/haoyangzheng-ai/didi-instruct.
LGFeb 3
Ultra Fast PDE Solving via Physics Guided Few-step DiffusionCindy Xiangrui Kong, Yueqi Wang, Haoyang Zheng et al.
Diffusion-based models have demonstrated impressive accuracy and generalization in solving partial differential equations (PDEs). However, they still face significant limitations, such as high sampling costs and insufficient physical consistency, stemming from their many-step iterative sampling mechanism and lack of explicit physics constraints. To address these issues, we propose Phys-Instruct, a novel physics-guided distillation framework which not only (1) compresses a pre-trained diffusion PDE solver into a few-step generator via matching generator and prior diffusion distributions to enable rapid sampling, but also (2) enhances the physics consistency by explicitly injecting PDE knowledge through a PDE distillation guidance. Physic-Instruct is built upon a solid theoretical foundation, leading to a practical physics-constrained training objective that admits tractable gradients. Across five PDE benchmarks, Phys-Instruct achieves orders-of-magnitude faster inference while reducing PDE error by more than 8 times compared to state-of-the-art diffusion baselines. Moreover, the resulting unconditional student model functions as a compact prior, enabling efficient and physically consistent inference for various downstream conditional tasks. Our results indicate that Phys-Instruct is a novel, effective, and efficient framework for ultra-fast PDE solving powered by deep generative models.
MLMar 4, 2025
LAPD: Langevin-Assisted Bayesian Active Learning for Physical DiscoveryCindy Xiangrui Kong, Haoyang Zheng, Guang Lin
Discovering physical laws from data is a fundamental challenge in scientific research, particularly when high-quality data are scarce or costly to obtain. Traditional methods for identifying dynamical systems often struggle with noise sensitivity, inefficiency in data usage, and the inability to quantify uncertainty effectively. To address these challenges, we propose Langevin-Assisted Active Physical Discovery (LAPD), a Bayesian framework that integrates replica-exchange stochastic gradient Langevin Monte Carlo to simultaneously enable efficient system identification and robust uncertainty quantification (UQ). By balancing gradient-driven exploration in coefficient space and generating an ensemble of candidate models during exploitation, LAPD achieves reliable, uncertainty-aware identification with noisy data. In the face of data scarcity, the probabilistic foundation of LAPD further promotes the integration of active learning (AL) via a hybrid uncertainty-space-filling acquisition function. This strategy sequentially selects informative data to reduce data collection costs while maintaining accuracy. We evaluate LAPD on diverse nonlinear systems such as the Lotka-Volterra, Lorenz, Burgers, and Convection-Diffusion equations, demonstrating its robustness with noisy and limited data as well as superior uncertainty calibration compared to existing methods. The AL extension reduces the required measurements by around 60% for the Lotka-Volterra system and by around 40% for Burgers' equation compared to random data sampling, highlighting its potential for resource-constrained experiments. Our framework establishes a scalable, uncertainty-aware methodology for data-efficient discovery of dynamical systems, with broad applicability to problems where high-fidelity data acquisition is prohibitively expensive.