31.5LGFeb 10, 2025
VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning DataThomas Zeng, Shuibai Zhang, Shutong Wu et al.
Process Reward Models (PRMs) have proven effective at enhancing mathematical reasoning for Large Language Models (LLMs) by leveraging increased inference-time computation. However, they are predominantly trained on mathematical data and their generalizability to non-mathematical domains has not been rigorously studied. In response, this work first shows that current PRMs have poor performance in other domains. To address this limitation, we introduce VersaPRM, a multi-domain PRM trained on synthetic reasoning data generated using our novel data generation and annotation method. VersaPRM achieves consistent performance gains across diverse domains. For instance, in the MMLU-Pro category of Law, VersaPRM via weighted majority voting, achieves a 7.9% performance gain over the majority voting baseline -- surpassing Qwen2.5-Math-PRM's gain of 1.3%. We further contribute to the community by open-sourcing all data, code and models for VersaPRM.
13.1CVDec 7, 2023
DiffusionPhase: Motion Diffusion in Frequency DomainWeilin Wan, Yiming Huang, Shutong Wu et al.
In this study, we introduce a learning-based method for generating high-quality human motion sequences from text descriptions (e.g., ``A person walks forward"). Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences, due to limited text-to-motion datasets and the pose representations used that often lack expressiveness or compactness. To address these issues, we propose the first method for text-conditioned human motion generation in the frequency domain of motions. We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space with high-frequency details encoded, capturing the local periodicity of motions in time and space with high accuracy. We also introduce a conditional diffusion model for predicting periodic motion parameters based on text descriptions and a start pose, efficiently achieving smooth transitions between motion sequences associated with different text descriptions. Experiments demonstrate that our approach outperforms current methods in generating a broader variety of high-quality motions, and synthesizing long sequences with natural transitions.
22.0LGSep 30, 2025
Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language ModelsShutong Wu, Jiawei Zhang
Diffusion Large Language Models (DLLMs) have emerged as a new paradigm of language modeling beyond autoregressive next-token prediction. Thanks to their bidirectional attention mechanism, DLLMs are more capable of capturing the connection of context, and thus show unique advantages in challenges like the famous "reversal curse" or learning under data-constrained scenarios. In addition, taking advantage of their inherent modeling foundations, DLLMs have the great potential of efficient inference with parallel decoding algorithms, which enable multi-token prediction per step. However, the high generation quality often requires the number of decoding steps equal to the sequence length, which performs a one-token-per-step decoding, and existing parallel decoding algorithms, which yield suboptimal decoding paths, bring inference speedup at the cost of non-negligible performance degradation. To overcome this challenge, we introduce Free Draft-and-Verification (FreeDave), a novel fast decoding algorithm tailored for DLLMs that achieves lossless parallel decoding without any model modification or extra modules. Specifically, we propose an algorithm of parallel-decoded candidate generation and verification, which is theoretically guaranteed to use the fewest model forward calls to reproduce the same sequence generated by static decoding when enough computation and memory budget is provided. By extensive evaluations on math reasoning and code generation benchmarks across different DLLMs, FreeDave is proven to boost the inference throughput up to $3.78\times$ without performance degradation.