LGAug 4, 2025Code
PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement LearningDongchi Huang, Jiaqi Wang, Yang Li et al.
Partial observability presents a significant challenge for Safe Reinforcement Learning (Safe RL), as it impedes the identification of potential risks and rewards. Leveraging specific types of privileged information during training to mitigate the effects of partial observability has yielded notable empirical successes. In this paper, we propose Asymmetric Constrained Partially Observable Markov Decision Processes (ACPOMDPs) to theoretically examine the advantages of incorporating privileged information in Safe RL. Building upon ACPOMDPs, we propose the Privileged Information Guided Dreamer (PIGDreamer), a model-based RL approach that leverages privileged information to enhance the agent's safety and performance through privileged representation alignment and an asymmetric actor-critic structure. Our empirical results demonstrate that PIGDreamer significantly outperforms existing Safe RL methods. Furthermore, compared to alternative privileged RL methods, our approach exhibits enhanced performance, robustness, and efficiency. Codes are available at: https://github.com/hggforget/PIGDreamer.
DBJul 9, 2024
FuncEvalGMN: Evaluating Functional Correctness of SQL via Graph Matching NetworkYi Zhan, Yang Sun, Han Weng et al.
In this paper, we propose a novel graph-based methodology to evaluate the functional correctness of SQL generation. Conventional metrics for assessing SQL code generation, such as matching-based and execution-based methods (e.g., exact set match and execution accuracy), are subject to two primary limitations. Firstly, the former fails to effectively assess functional correctness, as different SQL queries may possess identical functionalities. Secondly, the latter is susceptible to producing false positive samples in evaluations. Our proposed evaluation method, \texttt{FuncEvalGMN}, does not depend on the sufficient preparation of the test data, and it enables precise testing of the functional correctness of the code. Firstly, we parse SQL using a relational operator tree (ROT) called \textit{Relnode}, which contains rich semantic information from the perspective of logical execution.Then, we introduce a GNN-based approach for predicting the functional correctness of generated SQL. This approach incorporates global positional embeddings to address the limitations with the loss of topological information in conventional graph matching frameworks. As an auxiliary contribution, we propose a rule-based matching algorithm, Relnode Partial Matching (\texttt{RelPM}) as a baseline. Finally, we contribute a dataset, \texttt{Pair-Aug-Spider} with a training set and two testing sets, each comprising pairs of SQL codes to simulate various SQL code evaluation scenarios. The training set and one testing dataset focus on code generation using large language models (LLMs), while the other emphasizes SQL equivalence rewriting.
ROAug 4, 2025
CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement LearningDongchi Huang, Zhirui Fang, Tianle Zhang et al.
Vision-Language-Action (VLA) models demonstrate significant potential for developing generalized policies in real-world robotic control. This progress inspires researchers to explore fine-tuning these models with Reinforcement Learning (RL). However, fine-tuning VLA models with RL still faces challenges related to sample efficiency, compatibility with action chunking, and training stability. To address these challenges, we explore the fine-tuning of VLA models through offline reinforcement learning incorporating action chunking. In this work, we propose Chunked RL, a novel reinforcement learning framework specifically designed for VLA models. Within this framework, we extend temporal difference (TD) learning to incorporate action chunking, a prominent characteristic of VLA models. Building upon this framework, we propose CO-RFT, an algorithm aimed at fine-tuning VLA models using a limited set of demonstrations (30 to 60 samples). Specifically, we first conduct imitation learning (IL) with full parameter fine-tuning to initialize both the backbone and the policy. Subsequently, we implement offline RL with action chunking to optimize the pretrained policy. Our empirical results in real-world environments demonstrate that CO-RFT outperforms previous supervised methods, achieving a 57% improvement in success rate and a 22.3% reduction in cycle time. Moreover, our method exhibits robust positional generalization capabilities, attaining a success rate of 44.3% in previously unseen positions.
ROMay 21, 2025
Object-Focus Actor for Data-efficient Robot Generalization Dexterous ManipulationYihang Li, Tianle Zhang, Xuelong Wei et al.
Robot manipulation learning from human demonstrations offers a rapid means to acquire skills but often lacks generalization across diverse scenes and object placements. This limitation hinders real-world applications, particularly in complex tasks requiring dexterous manipulation. Vision-Language-Action (VLA) paradigm leverages large-scale data to enhance generalization. However, due to data scarcity, VLA's performance remains limited. In this work, we introduce Object-Focus Actor (OFA), a novel, data-efficient approach for generalized dexterous manipulation. OFA exploits the consistent end trajectories observed in dexterous manipulation tasks, allowing for efficient policy training. Our method employs a hierarchical pipeline: object perception and pose estimation, pre-manipulation pose arrival and OFA policy execution. This process ensures that the manipulation is focused and efficient, even in varied backgrounds and positional layout. Comprehensive real-world experiments across seven tasks demonstrate that OFA significantly outperforms baseline methods in both positional and background generalization tests. Notably, OFA achieves robust performance with only 10 demonstrations, highlighting its data efficiency.