AIFeb 12Code
Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse SupervisionXiaohan He, Shiyang Feng, Songtao Huang et al.
Large language models (LLMs) have demonstrated exceptional reasoning capabilities, and co-evolving paradigms have shown promising results in domains such as code and math. However, in scientific reasoning tasks, these models remain fragile due to unreliable solution evaluation and limited diversity in verification strategies. In this work, we propose Sci-CoE, a two-stage scientific co-evolving framework that enables models to self-evolve as both solver and verifier through a transition from sparse supervision to unsupervised learning. In the first stage, the model uses a small set of annotated data to establish fundamental correctness judgment anchors for the Verifier. In the second stage, we introduce a geometric reward mechanism that jointly considers consensus, reliability, and diversity, driving large-scale self-iteration on unlabeled data. Experiments on several general scientific benchmarks demonstrate that Sci-CoE enhances complex reasoning capabilities and exhibits strong scalability, facilitating the construction of more robust and diverse evaluation systems. Codes are available at https://github.com/InternScience/Sci-CoE.
AISep 2, 2025Code
The Landscape of Agentic Reinforcement Learning for LLMs: A SurveyGuibin Zhang, Hejia Geng, Xiaohang Yu et al.
The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Markov decision processes (POMDPs) that define Agentic RL. Building on this foundation, we propose a comprehensive twofold taxonomy: one organized around core agentic capabilities, including planning, tool use, memory, reasoning, self-improvement, and perception, and the other around their applications across diverse task domains. Central to our thesis is that reinforcement learning serves as the critical mechanism for transforming these capabilities from static, heuristic modules into adaptive, robust agentic behavior. To support and accelerate future research, we consolidate the landscape of open-source environments, benchmarks, and frameworks into a practical compendium. By synthesizing over five hundred recent works, this survey charts the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose AI agents.
CLApr 8
Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM AgentsHeng Zhou, Zelin Tan, Zhemeng Zhang et al.
When an LLM-based agent improves on a task, is the gain from the model itself or from the reasoning paradigm wrapped around it? We study this question by comparing six inference-time paradigms, namely Direct, CoT, ReAct, Plan-Execute, Reflection, and ReCode, across four frontier LLMs and ten benchmarks, yielding roughly 18,000 runs. We find that reasoning structure helps dramatically on some tasks but hurts on others: ReAct improves over Direct by 44pp on GAIA, while CoT degrades performance by 15pp on HumanEval. No single paradigm dominates, and oracle per-task selection beats the best fixed paradigm by 17.1pp on average. Motivated by this complementarity, we propose a select-then-solve approach: before answering each task, a lightweight embedding-based router selects the most suitable paradigm. Across four models, the router improves average accuracy from 47.6% to 53.1%, outperforming the best fixed paradigm at 50.3% by 2.8pp and recovering up to 37% of the oracle gap. In contrast, zero-shot self-routing only works for GPT-5 at 67.1% and fails for weaker models, all trailing the learned router. Our results argue that reasoning paradigm selection should be a per-task decision made by a learned router, not a fixed architectural choice.
LGFeb 10, 2025Code
TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series ForecastingSongtao Huang, Zhen Zhao, Can Li et al.
Real-world time series often have multiple frequency components that are intertwined with each other, making accurate time series forecasting challenging. Decomposing the mixed frequency components into multiple single frequency components is a natural choice. However, the information density of patterns varies across different frequencies, and employing a uniform modeling approach for different frequency components can lead to inaccurate characterization. To address this challenges, inspired by the flexibility of the recent Kolmogorov-Arnold Network (KAN), we propose a KAN-based Frequency Decomposition Learning architecture (TimeKAN) to address the complex forecasting challenges caused by multiple frequency mixtures. Specifically, TimeKAN mainly consists of three components: Cascaded Frequency Decomposition (CFD) blocks, Multi-order KAN Representation Learning (M-KAN) blocks and Frequency Mixing blocks. CFD blocks adopt a bottom-up cascading approach to obtain series representations for each frequency band. Benefiting from the high flexibility of KAN, we design a novel M-KAN block to learn and represent specific temporal patterns within each frequency band. Finally, Frequency Mixing blocks is used to recombine the frequency bands into the original format. Extensive experimental results across multiple real-world time series datasets demonstrate that TimeKAN achieves state-of-the-art performance as an extremely lightweight architecture. Code is available at https://github.com/huangst21/TimeKAN.
AIFeb 9
InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific DiscoveryShiyang Feng, Runmin Ma, Xiangchao Yan et al.
We introduce InternAgent-1.5, a unified system designed for end-to-end scientific discovery across computational and empirical domains. The system is built on a structured architecture composed of three coordinated subsystems for generation, verification, and evolution. These subsystems are supported by foundational capabilities for deep research, solution optimization, and long horizon memory. The architecture allows InternAgent-1.5 to operate continuously across extended discovery cycles while maintaining coherent and improving behavior. It also enables the system to coordinate computational modeling and laboratory experimentation within a single unified system. We evaluate InternAgent-1.5 on scientific reasoning benchmarks such as GAIA, HLE, GPQA, and FrontierScience, and the system achieves leading performance that demonstrates strong foundational capabilities. Beyond these benchmarks, we further assess two categories of discovery tasks. In algorithm discovery tasks, InternAgent-1.5 autonomously designs competitive methods for core machine learning problems. In empirical discovery tasks, it executes complete computational or wet lab experiments and produces scientific findings in earth, life, biological, and physical domains. Overall, these results show that InternAgent-1.5 provides a general and scalable framework for autonomous scientific discovery.
ROApr 7
CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional EnvironmentLi Kang, Yutao Fan, Rui Li et al.
Multi-agent embodied systems hold promise for complex collaborative manipulation, yet face critical challenges in spatial coordination, temporal reasoning, and shared workspace awareness. Inspired by human collaboration where cognitive planning occurs separately from physical execution, we introduce the concept of compositional environment -- a synergistic integration of real-world and simulation components that enables multiple robotic agents to perceive intentions and operate within a unified decision-making space. Building on this concept, we present CoEnv, a framework that leverages simulation for safe strategy exploration while ensuring reliable real-world deployment. CoEnv operates through three stages: real-to-sim scene reconstruction that digitizes physical workspaces, VLM-driven action synthesis supporting both real-time planning with high-level interfaces and iterative planning with code-based trajectory generation, and validated sim-to-real transfer with collision detection for safe deployment. Extensive experiments on challenging multi-arm manipulation benchmarks demonstrate CoEnv's effectiveness in achieving high task success rates and execution efficiency, establishing a new paradigm for multi-agent embodied AI.
AIMay 22, 2025
InternAgent: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to VerificationInternAgent Team, Bo Zhang, Shiyang Feng et al.
Artificial Intelligence (AI) is accelerating the transformation of scientific research paradigms, not only enhancing research efficiency but also driving innovation. We introduce InternAgent, a unified closed-loop multi-agent framework to conduct Autonomous Scientific Research (ASR) across various scientific research fields, enabling researchers to tackle complicated problems in these fields with unprecedented speed and precision. InternAgent highlights three key advantages: 1) Scalability: InternAgent has demonstrated its versatility across 12 scientific research tasks, capable of generating innovative ideas to enhance the performance of baseline code. 2) Interactivity: InternAgent provides an interface for human expert feedback and multi-agent interaction in automated end-to-end processes, allowing for the seamless integration of domain expert knowledge. 3) Efficiency: InternAgent has achieved promising performance gains in several scientific fields with significantly less time cost compared to human efforts. For instance, in reaction yield prediction, it increased from 27.6% to 35.4% in just 12 hours; in enhancer activity prediction, accuracy rose from 0.65 to 0.79 with only 4 hours of processing; and in 2D semantic segmentation, precision advanced from 78.8% to 81.0% in a mere 30 hours.
AIApr 18, 2024
DST-GTN: Dynamic Spatio-Temporal Graph Transformer Network for Traffic ForecastingSongtao Huang, Hongjin Song, Tianqi Jiang et al.
Accurate traffic forecasting is essential for effective urban planning and congestion management. Deep learning (DL) approaches have gained colossal success in traffic forecasting but still face challenges in capturing the intricacies of traffic dynamics. In this paper, we identify and address this challenges by emphasizing that spatial features are inherently dynamic and change over time. A novel in-depth feature representation, called Dynamic Spatio-Temporal (Dyn-ST) features, is introduced, which encapsulates spatial characteristics across varying times. Moreover, a Dynamic Spatio-Temporal Graph Transformer Network (DST-GTN) is proposed by capturing Dyn-ST features and other dynamic adjacency relations between intersections. The DST-GTN can model dynamic ST relationships between nodes accurately and refine the representation of global and local ST characteristics by adopting adaptive weights in low-pass and all-pass filters, enabling the extraction of Dyn-ST features from traffic time-series data. Through numerical experiments on public datasets, the DST-GTN achieves state-of-the-art performance for a range of traffic forecasting tasks and demonstrates enhanced stability.
LGJan 28, 2025
Applying Ensemble Models based on Graph Neural Network and Reinforcement Learning for Wind Power ForecastingHongjin Song, Qianrun Chen, Tianqi Jiang et al.
Accurately predicting the wind power output of a wind farm across various time scales utilizing Wind Power Forecasting (WPF) is a critical issue in wind power trading and utilization. The WPF problem remains unresolved due to numerous influencing variables, such as wind speed, temperature, latitude, and longitude. Furthermore, achieving high prediction accuracy is crucial for maintaining electric grid stability and ensuring supply security. In this paper, we model all wind turbines within a wind farm as graph nodes in a graph built by their geographical locations. Accordingly, we propose an ensemble model based on graph neural networks and reinforcement learning (EMGRL) for WPF. Our approach includes: (1) applying graph neural networks to capture the time-series data from neighboring wind farms relevant to the target wind farm; (2) establishing a general state embedding that integrates the target wind farm's data with the historical performance of base models on the target wind farm; (3) ensembling and leveraging the advantages of all base models through an actor-critic reinforcement learning framework for WPF.