AIJul 10, 2023Code
RLTF: Reinforcement Learning from Unit Test FeedbackJiate Liu, Yiqin Zhu, Kaiwen Xiao et al.
The goal of program synthesis, or code generation, is to generate executable code based on given descriptions. Recently, there has been an increasing number of studies employing reinforcement learning (RL) to improve the performance of large language models (LLMs) for code. However, current representative works either rely solely on offline frameworks, limiting the exploration of new sample spaces, or fall short in the utilization of unit test signals, not accounting for specific error locations within the code. To address these issues, we propose RLTF, i.e., Reinforcement Learning from Unit Test Feedback, a novel online RL framework with unit test feedback of multi-granularity for refining code LLMs. Our approach generates data in real-time during training and simultaneously utilizes fine-grained feedback signals to guide the model towards producing higher-quality code. Extensive experiments show that RLTF achieves state-of-the-art performance on the APPS and the MBPP benchmarks. Our code is available at: https://github.com/Zyq-scut/RLTF.
LGDec 21, 2025
PIPCFR: Pseudo-outcome Imputation with Post-treatment Variables for Individual Treatment Effect EstimationZichuan Lin, Xiaokai Huang, Jiate Liu et al.
The estimation of individual treatment effects (ITE) focuses on predicting the outcome changes that result from a change in treatment. A fundamental challenge in observational data is that while we need to infer outcome differences under alternative treatments, we can only observe each individual's outcome under a single treatment. Existing approaches address this limitation either by training with inferred pseudo-outcomes or by creating matched instance pairs. However, recent work has largely overlooked the potential impact of post-treatment variables on the outcome. This oversight prevents existing methods from fully capturing outcome variability, resulting in increased variance in counterfactual predictions. This paper introduces Pseudo-outcome Imputation with Post-treatment Variables for Counterfactual Regression (PIPCFR), a novel approach that incorporates post-treatment variables to improve pseudo-outcome imputation. We analyze the challenges inherent in utilizing post-treatment variables and establish a novel theoretical bound for ITE risk that explicitly connects post-treatment variables to ITE estimation accuracy. Unlike existing methods that ignore these variables or impose restrictive assumptions, PIPCFR learns effective representations that preserve informative components while mitigating bias. Empirical evaluations on both real-world and simulated datasets demonstrate that PIPCFR achieves significantly lower ITE errors compared to existing methods.
IRJan 29
A2RAG: Adaptive Agentic Graph Retrieval for Cost-Aware and Reliable ReasoningJiate Liu, Zebin Chen, Shaobo Qiao et al.
Graph Retrieval-Augmented Generation (Graph-RAG) enhances multihop question answering by organizing corpora into knowledge graphs and routing evidence through relational structure. However, practical deployments face two persistent bottlenecks: (i) mixed-difficulty workloads where one-size-fits-all retrieval either wastes cost on easy queries or fails on hard multihop cases, and (ii) extraction loss, where graph abstraction omits fine-grained qualifiers that remain only in source text. We present A2RAG, an adaptive-and-agentic GraphRAG framework for cost-aware and reliable reasoning. A2RAG couples an adaptive controller that verifies evidence sufficiency and triggers targeted refinement only when necessary, with an agentic retriever that progressively escalates retrieval effort and maps graph signals back to provenance text to remain robust under extraction loss and incomplete graphs. Experiments on HotpotQA and 2WikiMultiHopQA demonstrate that A2RAG achieves +9.9/+11.8 absolute gains in Recall@2, while cutting token consumption and end-to-end latency by about 50% relative to iterative multihop baselines.