Bing Xie

h-index15

3papers

29citations

Novelty58%

AI Score39

Ranked #78,706 of 194,257 authors (top 41%)#1,892 in CR (top 28%)

3 Papers

15.9SEAug 5, 2025

Tool-integrated Reinforcement Learning for Repo Deep Search

Zexiong Ma, Chao Peng, Qunhong Zeng et al.

Issue localization, the process of identifying code locations that need modification to resolve software issues, is a critical yet challenging task in software development. The semantic gap between natural language issue descriptions and faulty code requires complex multi-hop reasoning through code dependencies. Existing LLM-based agents attempt to address this by integrating repository retrieval tools. However, this transforms issue localization into a demanding task we call Repo Deep Search, which requires the LLM to effectively utilize various repository retrieval tools throughout a multi-step reasoning and navigation process. To tackle this challenge, we present ToolTrain, a two-stage tool-integrated training framework combining rejection-sampled supervised fine-tuning and tool-integrated reinforcement learning to enhance LLMs' ability to use retrieval tools for issue localization. Experimental results show that ToolTrain-trained models achieve state-of-the-art performance, with our 32B model even surpassing Claude-3.7 on function-level localization. The results also show that improved localization performance translates to better end-to-end issue resolution performance. This further demonstrates that training for issue localization is a viable and effective strategy for improving automated software development.

11.7DCOct 20, 2019Code

RLScheduler: An Automated HPC Batch Job Scheduler Using Reinforcement Learning

Di Zhang, Dong Dai, Youbiao He et al.

Today high-performance computing (HPC) platforms are still dominated by batch jobs. Accordingly, effective batch job scheduling is crucial to obtain high system efficiency. Existing HPC batch job schedulers typically leverage heuristic priority functions to prioritize and schedule jobs. But, once configured and deployed by the experts, such priority functions can hardly adapt to the changes of job loads, optimization goals, or system settings, potentially leading to degraded system efficiency when changes occur. To address this fundamental issue, we present RLScheduler, an automated HPC batch job scheduler built on reinforcement learning. RLScheduler relies on minimal manual interventions or expert knowledge, but can learn high-quality scheduling policies via its own continuous 'trial and error'. We introduce a new kernel-based neural network structure and trajectory filtering mechanism in RLScheduler to improve and stabilize the learning process. Through extensive evaluations, we confirm that RLScheduler can learn high-quality scheduling policies towards various workloads and various optimization goals with relatively low computation cost. Moreover, we show that the learned models perform stably even when applied to unseen workloads, making them practical for production use.

4.5CRJan 22, 2017

Certificate Linking and Caching for Logical Trust

Qiang Cao, Vamsi Thummala, Jeffrey S. Chase et al.

SAFE is a data-centric platform for building multi-domain networked systems, i.e., systems whose participants are controlled by different principals. Participants make trust decisions by issuing local queries over logic content exchanged in certificates. The contribution of SAFE is to address a key barrier to practical use of logical trust: the problem of identifying, gathering, and assembling the certificates that are relevant to each trust decision. SAFE uses a simple linking abstraction to organize and share certificates according to scripted primitives that implement the application's trust kernel and isolate it from logic concerns. We show that trust scripting with logical data exchange yields compact trust cores for example applications: federated naming, nested groups and roles, secure IP prefix delegation and routing, attestation-based access control, and a federated infrastructure-as-a-service system. Linking allows granular control over dynamic logic content based on dependency relationships, enabling a logic server to make secure inferences at high throughput.