LGSep 3, 2025Code
Loong: Synthesize Long Chain-of-Thoughts at Scale through VerifiersXingyue Huang, Rishabh, Gregor Franke et al.
Recent advances in Large Language Models (LLMs) have shown that their reasoning capabilities can be significantly improved through Reinforcement Learning with Verifiable Reward (RLVR), particularly in domains like mathematics and programming, where ground-truth correctness can be automatically evaluated. However, extending this success to other reasoning-intensive domains remains challenging due to the scarcity of high-quality, verifiable datasets and the high cost of human supervision. In this work, we introduce the Loong Project: an open-source framework for scalable synthetic data generation and verification across a diverse range of reasoning-intensive domains. The framework consists of two key components: (1) LoongBench, a curated seed dataset containing 8,729 human-vetted examples across 12 domains (e.g., Advanced Mathematics, Chemistry, Logic), each paired with executable code and rich metadata; and (2) LoongEnv, a modular synthetic data generation environment that supports multiple prompting strategies to produce new question-answer-code triples. Together, these components form an agent-environment loop that enables reinforcement learning, where an LLM-based agent is rewarded for generating Chain-of-Thought (CoT) solutions that align with code-executed answers. Empirically, we benchmark LoongBench on a broad suite of both open-source and proprietary LLMs to evaluate domain coverage and reveal performance bottlenecks. In addition, we conduct a comprehensive analysis of synthetic data generated by LoongEnv, examining correctness, difficulty, and diversity. Code and documentation are available at https://github.com/camel-ai/loong.
LGJun 23, 2025Code
Distilling Tool Knowledge into Language Models via Back-Translated TracesXingyue Huang, Xianglong Hu, Zifeng Ding et al.
Large language models (LLMs) often struggle with mathematical problems that require exact computation or multi-step algebraic reasoning. Tool-integrated reasoning (TIR) offers a promising solution by leveraging external tools such as code interpreters to ensure correctness, but it introduces inference-time dependencies that hinder scalability and deployment. In this work, we propose a new paradigm for distilling tool knowledge into LLMs purely through natural language. We first construct a Solver Agent that solves math problems by interleaving planning, symbolic tool calls, and reflective reasoning. Then, using a back-translation pipeline powered by multiple LLM-based agents, we convert interleaved TIR traces into natural language reasoning traces. A Translator Agent generates explanations for individual tool calls, while a Rephrase Agent merges them into a fluent and globally coherent narrative. Empirically, we show that fine-tuning a small open-source model on these synthesized traces enables it to internalize both tool knowledge and structured reasoning patterns, yielding gains on competition-level math benchmarks without requiring tool access at inference.
ROJun 10, 2024Code
Multicam-SLAM: Non-overlapping Multi-camera SLAM for Indirect Visual Localization and NavigationShenghao Li, Luchao Pang, Xianglong Hu
This paper presents a novel approach to visual simultaneous localization and mapping (SLAM) using multiple RGB-D cameras. The proposed method, Multicam-SLAM, significantly enhances the robustness and accuracy of SLAM systems by capturing more comprehensive spatial information from various perspectives. This method enables the accurate determination of pose relationships among multiple cameras without the need for overlapping fields of view. The proposed Muticam-SLAM includes a unique multi-camera model, a multi-keyframes structure, and several parallel SLAM threads. The multi-camera model allows for the integration of data from multiple cameras, while the multi-keyframes and parallel SLAM threads ensure efficient and accurate pose estimation and mapping. Extensive experiments in various environments demonstrate the superior accuracy and robustness of the proposed method compared to conventional single-camera SLAM systems. The results highlight the potential of the proposed Multicam-SLAM for more complex and challenging applications. Code is available at \url{https://github.com/AlterPang/Multi_ORB_SLAM}.
AIAug 31, 2025
Self-Exploring Language Models for Explainable Link Forecasting on Temporal Graphs via Reinforcement LearningZifeng Ding, Shenyang Huang, Zeyu Cao et al. · cambridge
Forecasting future links is a central task in temporal graph (TG) reasoning, requiring models to leverage historical interactions to predict upcoming ones. Traditional neural approaches, such as temporal graph neural networks, achieve strong performance but lack explainability and cannot be applied to unseen graphs without retraining. Recent studies have begun to explore using large language models (LLMs) for graph reasoning, but most of them are constrained to static graphs or small synthetic TGs and lack the evaluation of the quality of reasoning traces generated by LLMs. In this work, we present Reasoning-Enhanced Learning for Temporal Graphs (ReaL-TG), a reinforcement learning framework that fine-tunes LLMs to perform explainable link forecasting on real-world TGs. ReaL-TG uses outcome-based reward to encourage models to self-explore reasoning strategies from graph structure and to produce explanations that directly justify their predictions. To enable evaluation on LLM-generated reasoning traces, we propose a new evaluation protocol combining ranking metrics with an LLM-as-a-Judge system that assesses both the quality of reasoning and the impact of hallucinations. Experiments with ReaL-TG-4B, obtained by fine-tuning Qwen3-4B under our framework, show that it outperforms much larger frontier LLMs, including GPT-5 mini, on ranking metrics, while producing high-quality explanations confirmed by both the LLM judge and human evaluation.