Weihua Su

h-index17
2papers

2 Papers

ROOct 1, 2025
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Hengtao Li, Pengxiang Ding, Runze Suo et al.

Vision-Language-Action (VLA) models enable embodied decision-making but rely heavily on imitation learning, leading to compounding errors and poor robustness under distribution shift. Reinforcement learning (RL) can mitigate these issues yet typically demands costly real-world interactions or suffers from sim-to-real gaps. We introduce VLA-RFT, a reinforcement fine-tuning framework that leverages a data-driven world model as a controllable simulator. Trained from real interaction data, the simulator predicts future visual observations conditioned on actions, allowing policy rollouts with dense, trajectory-level rewards derived from goal-achieving references. This design delivers an efficient and action-aligned learning signal, drastically lowering sample requirements. With fewer than 400 fine-tuning steps, VLA-RFT surpasses strong supervised baselines and achieves greater efficiency than simulator-based RL. Moreover, it exhibits strong robustness under perturbed conditions, sustaining stable task execution. Our results establish world-model-based RFT as a practical post-training paradigm to enhance the generalization and robustness of VLA models. For more details, please refer to https://vla-rft.github.io/.

AISep 11, 2013
Approximate Counting CSP Solutions Using Partition Function

Junping Zhou, Weihua Su, Minghao Yin

We propose a new approximate method for counting the number of the solutions for constraint satisfaction problem (CSP). The method derives from the partition function based on introducing the free energy and capturing the relationship of probabilities of variables and constraints, which requires the marginal probabilities. It firstly obtains the marginal probabilities using the belief propagation, and then computes the number of solutions according to the partition function. This allows us to directly plug the marginal probabilities into the partition function and efficiently count the number of solutions for CSP. The experimental results show that our method can solve both random problems and structural problems efficiently.