CLJul 11, 2024Code
GTA: A Benchmark for General Tool AgentsJize Wang, Zerun Ma, Yining Li et al.
Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, failing to reveal the agents' real-world problem-solving abilities effectively. To address this, we propose GTA, a benchmark for General Tool Agents, featuring three main aspects: (i) Real user queries: human-written queries with simple real-world objectives but implicit tool-use, requiring the LLM to reason the suitable tools and plan the solution steps. (ii) Real deployed tools: an evaluation platform equipped with tools across perception, operation, logic, and creativity categories to evaluate the agents' actual task execution performance. (iii) Real multimodal inputs: authentic image files, such as spatial scenes, web page screenshots, tables, code snippets, and printed/handwritten materials, used as the query contexts to align with real-world scenarios closely. We design 229 real-world tasks and executable tool chains to evaluate mainstream LLMs. Our findings show that real-world user queries are challenging for existing LLMs, with GPT-4 completing less than 50% of the tasks and most LLMs achieving below 25%. This evaluation reveals the bottlenecks in the tool-use capabilities of current LLMs in real-world scenarios, which provides future direction for advancing general-purpose tool agents. The code and dataset are available at https://github.com/open-compass/GTA.
SYMar 9, 2016
Distributed Control for Charging Multiple Electric Vehicles with Overload LimitationBo Yang, Jingwei Li, Qiaoni Han et al.
Severe pollution induced by traditional fossil fuels arouses great attention on the usage of plug-in electric vehicles (PEVs) and renewable energy. However, large-scale penetration of PEVs combined with other kinds of appliances tends to cause excessive or even disastrous burden on the power grid, especially during peak hours. This paper focuses on the scheduling of PEVs charging process among different charging stations and each station can be supplied by both renewable energy generators and a distribution network. The distribution network also powers some uncontrollable loads. In order to minimize the on-grid energy cost with local renewable energy and non-ideal storage while avoiding the overload risk of the distribution network, an online algorithm consisting of scheduling the charging of PEVs and energy management of charging stations is developed based on Lyapunov optimization and Lagrange dual decomposition techniques. The algorithm can satisfy the random charging requests from PEVs with provable performance. Simulation results with real data demonstrate that the proposed algorithm can decrease the time-average cost of stations while avoiding overload in the distribution network in the presence of random uncontrollable loads.
82.3CLApr 17Code
GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended WorkflowsJize Wang, Xuanxuan Liu, Yining Li et al.
The development of general-purpose agents requires a shift from executing simple instructions to completing complex, real-world productivity workflows. However, current tool-use benchmarks remain misaligned with real-world requirements, relying on AI-generated queries, dummy tools, and limited system-level coordination. To address this, we propose GTA-2, a hierarchical benchmark for General Tool Agents (GTA) spanning atomic tool use and open-ended workflows. Built on real-world authenticity, it leverages real user queries, deployed tools, and multimodal contexts. (i) GTA-Atomic, inherited from our prior GTA benchmark, evaluates short-horizon, closed-ended tool-use precision. (ii) GTA-Workflow introduces long-horizon, open-ended tasks for realistic end-to-end completion. To evaluate open-ended deliverables, we propose a recursive checkpoint-based evaluation mechanism that decomposes objectives into verifiable sub-goals, enabling unified evaluation of both model capabilities and agent execution frameworks (i.e., execution harnesses). Experiments reveal a pronounced capability cliff: while frontier models already struggle on atomic tasks (below 50%), they largely fail on workflows, with top models achieving only 14.39% success. Further analysis shows that checkpoint-guided feedback improves performance, while advanced frameworks such as Manus and OpenClaw substantially enhance workflow completion, highlighting the importance of execution harness design beyond the underlying model capacity. These findings provide guidance for developing reliable personal and professional assistants. Dataset and code will be available at https://github.com/open-compass/GTA.
SYMar 20, 2017
Energy Trading between microgrids Individual Cost Minimization and Social Welfare MaximizationZhenyu Qiao, Bo Yang, Qimin Xu et al.
High penetration of renewable energy source makes microgrid (MGs) be environment friendly. However, the stochastic input from renewable energy resource brings difficulty in balancing the energy supply and demand. Purchasing extra energy from macrogrid to deal with energy shortage will increase MG energy cost. To mitigate intermittent nature of renewable energy, energy trading and energy storage which can exploit diversity of renewable energy generation across space and time are efficient and cost-effective methods. But current energy storage control action will impact the future control action which brings challenge to energy management. In addition, due to MG participating energy trading as prosumer, it calls for an efficient trading mechanism. Therefore, this paper focuses on the problem of MG energy management and trading. Energy trading problem is formulated as a stochastic optimization one with both individual profit and social welfare maximization. Firstly a Lyapunov optimization based algorithm is developed to solve the stochastic problem. Secondly the double-auction based mechanism is provided to attract MG truthful bidding for buying and selling energy. Through theoretical analysis, we demonstrate that individual MG can achieve a time average energy cost close to offline optimum with tradeoff between storage capacity and energy trading cost. Meanwhile the social welfare is also asymptotically maximized under double auction. Simulation results based on real world data show the effectiveness of our algorithm.
SYSep 19, 2017
Hybrid Optimization Method for Reconfiguration of AC/DC Microgrids in All-Electric ShipsQimin Xu, Bo Yang, Zhizhang Pan et al.
Since the limited power capacity, finite inertia, and dynamic loads make the shipboard power system (SPS) vulnerable, the automatic reconfiguration for failure recovery in SPS is an extremely significant but still challenging problem. It is not only required to operate accurately and optimally, but also to satisfy operating constraints. In this paper, we consider the reconfiguration optimization for hybrid AC/DC microgrids in all-electric ships. Firstly, the multi-zone medium voltage DC (MVDC) SPS model is presented. In this model, the DC power flow for reconfiguration and a generalized AC/DC converter are modeled for accurate reconfiguration. Secondly, since this problem is mixed integer nonlinear programming (MINLP), a hybrid method based on Newton Raphson and Biogeography based Optimization (NRBBO) is designed according to the characteristics of system, loads, and faults. This method facilitates to maximize the weighted load restoration while satisfying operating constraints. Finally, the simulation results demonstrate this method has advantages in terms of power restoration and convergence speed.
32.9SYMay 26
Sample Complexity of Policy Gradient for Log-Growth ControlQiuhua Pan, Yukai Shen, Liwei Zhang et al.
We study the sample complexity of policy gradient for log-growth control -- the problem of learning, from observed state transitions, a feedback gain that optimally stabilizes a scalar linear system driven through a multiplicative-noise actuation channel. The objective $J(K) = \mathbb{E}[\log|1+BK|]$ is the top Lyapunov exponent of the closed loop. This problem carries a structural difficulty we call the cusp obstruction: the optimal gain $K^*$ always places the noise singularity $b_{\rm sing}(K) = -1/K$ in the interior of the support. At this singular optimum the policy gradient exists only as a Cauchy principal value, not as a Lebesgue integral, and the natural single-sample gradient estimator has infinite variance. Standard first-order stochastic-optimization analysis is thus inapplicable at the optimum, and merely smoothing the objective does not resolve the difficulty. The obstruction, however, has an exploitable symmetry: the Cauchy kernel is an odd function of the displacement from the moving pole, so pairing each observation with its reflection through the pole cancels the divergent part. This one cancellation simultaneously controls the population curvature, the gradient-estimator variance, and the bias incurred when the noise density is estimated. Combining these bounds with a closed-form single-transition gradient oracle, we prove that projected mini-batch policy gradient, initialized in any compact subset of the stabilizing region, attains total sample complexity $\tilde{O}(1/η)$ when the noise density is known and $\tilde{O}(η^{-(2s+1)/(2s)})$ when it must be estimated, for $C^s$ noise densities with $s \geq 2$.
SYJul 23, 2018
Distributed Load Shedding for Microgrid with Compensation Support via Wireless NetworkQimin Xu, Bo Yang, Cailian Chen et al.
Due to the limited generation and finite inertia, microgrid suffers from the large frequency and voltage deviation which can lead to system collapse. Thus, reliable load shedding to keep frequency stable is required. Wireless network, benefiting from the high flexibility and low deployment cost, is considered as a promising technology for fine-grained management. In this paper, for balancing the supply-demand and reducing the load-shedding amount, a distributed load shedding solution via wireless network is proposed. Firstly, active power coordination of different priority loads is formulated as an optimisation problem. To solve it, a distributed load shedding algorithm based on subgradient method (DLSS) is developed for gradually shedding loads. Using this method, power compensation can be utilised and has more time to lower the power deficit so as to reduce the load-shedding amount. Secondly, to increase the response rate and enhance the reliability of our method, a multicast metropolis schedule based on TDMA (MMST) is developed. In this protocol, time slots are dedicatedly allocated and a checking and retransmission mechanism is utilised. Finally, the proposed solution is evaluated by NS3-Matlab co-simulator. The numerical results demonstrate the feasibility and effectiveness of our solution.
AIJan 26
RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-AgentsJize Wang, Han Wu, Zhiyuan You et al.
Mixture-of-Agents (MoA) improves LLM performance through layered collaboration, but its dense topology raises costs and latency. Existing methods employ LLM judges to filter responses, yet still require all models to perform inference before judging, failing to cut costs effectively. They also lack model selection criteria and struggle with large model pools, where full inference is costly and can exceed context limits. To address this, we propose RouteMoA, an efficient mixture-of-agents framework with dynamic routing. It employs a lightweight scorer to perform initial screening by predicting coarse-grained performance from the query, narrowing candidates to a high-potential subset without inference. A mixture of judges then refines these scores through lightweight self- and cross-assessment based on existing model outputs, providing posterior correction without additional inference. Finally, a model ranking mechanism selects models by balancing performance, cost, and latency. RouteMoA outperforms MoA across varying tasks and model pool sizes, reducing cost by 89.8% and latency by 63.6% in the large-scale model pool.
CVJun 18, 2025
NTIRE 2025 Image Shadow Removal Challenge ReportFlorin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou et al.
This work examines the findings of the NTIRE 2025 Shadow Removal Challenge. A total of 306 participants have registered, with 17 teams successfully submitting their solutions during the final evaluation phase. Following the last two editions, this challenge had two evaluation tracks: one focusing on reconstruction fidelity and the other on visual perception through a user study. Both tracks were evaluated with images from the WSRD+ dataset, simulating interactions between self- and cast-shadows with a large number of diverse objects, textures, and materials.
CVDec 27, 2024
CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMsSiyu Wang, Cailian Chen, Xinyi Le et al.
Computer-aided design (CAD) significantly enhances the efficiency, accuracy, and innovation of design processes by enabling precise 2D and 3D modeling, extensive analysis, and optimization. Existing methods for creating CAD models rely on latent vectors or point clouds, which are difficult to obtain, and storage costs are substantial. Recent advances in Multimodal Large Language Models (MLLMs) have inspired researchers to use natural language instructions and images for CAD model construction. However, these models still struggle with inferring accurate 3D spatial location and orientation, leading to inaccuracies in determining the spatial 3D starting points and extrusion directions for constructing geometries. This work introduces CAD-GPT, a CAD synthesis method with spatial reasoning-enhanced MLLM that takes either a single image or a textual description as input. To achieve precise spatial inference, our approach introduces a 3D Modeling Spatial Mechanism. This method maps 3D spatial positions and 3D sketch plane rotation angles into a 1D linguistic feature space using a specialized spatial unfolding mechanism, while discretizing 2D sketch coordinates into an appropriate planar space to enable precise determination of spatial starting position, sketch orientation, and 2D sketch coordinate translations. Extensive experiments demonstrate that CAD-GPT consistently outperforms existing state-of-the-art methods in CAD model synthesis, both quantitatively and qualitatively.
CVSep 13, 2021
CANS: Communication Limited Camera Network Self-Configuration for Intelligent Industrial SurveillanceJingzheng Tu, Qimin Xu, Cailian Chen
Realtime and intelligent video surveillance via camera networks involve computation-intensive vision detection tasks with massive video data, which is crucial for safety in the edge-enabled industrial Internet of Things (IIoT). Multiple video streams compete for limited communication resources on the link between edge devices and camera networks, resulting in considerable communication congestion. It postpones the completion time and degrades the accuracy of vision detection tasks. Thus, achieving high accuracy of vision detection tasks under the communication constraints and vision task deadline constraints is challenging. Previous works focus on single camera configuration to balance the tradeoff between accuracy and processing time of detection tasks by setting video quality parameters. In this paper, an adaptive camera network self-configuration method (CANS) of video surveillance is proposed to cope with multiple video streams of heterogeneous quality of service (QoS) demands for edge-enabled IIoT. Moreover, it adapts to video content and network dynamics. Specifically, the tradeoff between two key performance metrics, \emph{i.e.,} accuracy and latency, is formulated as an NP-hard optimization problem with latency constraints. Simulation on real-world surveillance datasets demonstrates that the proposed CANS method achieves low end-to-end latency (13 ms on average) with high accuracy (92\% on average) with network dynamics. The results validate the effectiveness of the CANS.
DCJun 20, 2021
Low-Latency Federated Learning over Wireless Channels with Differential PrivacyKang Wei, Jun Li, Chuan Ma et al.
In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server. The performance of uploaded models in such situations can vary widely due to imbalanced data distributions, potential demands on privacy protections, and quality of transmissions. In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement. We solve this problem in the framework of multi-agent multi-armed bandit (MAMAB) to deal with the situation where there are multiple clients confornting different unknown transmission environments, e.g., channel fading and interferences. Specifically, we first transform the long-term constraints on both training performance and each client's DP into a virtual queue based on the Lyapunov drift technique. Then, we convert the MAMAB to a max-min bipartite matching problem at each communication round, by estimating rewards with the upper confidence bound (UCB) approach. More importantly, we propose two efficient solutions to this matching problem, i.e., modified Hungarian algorithm and greedy matching with a better alternative (GMBA), in which the first one can achieve the optimal solution with a high complexity while the second one approaches a better trade-off by enabling a verified low-complexity with little performance loss. In addition, we develop an upper bound on the expected regret of this MAMAB based FL framework, which shows a linear growth over the logarithm of communication rounds, justifying its theoretical feasibility. Extensive experimental results are conducted to validate the effectiveness of our proposed algorithms, and the impacts of various parameters on the FL performance over wireless edge networks are also discussed.
ROOct 14, 2019
Intelligent Physical Attack Against Mobile Robots With Obstacle-AvoidanceYushan Li, Jianping He, Cailian Chen et al.
The security issue of mobile robots has attracted considerable attention in recent years. In this paper, we propose an intelligent physical attack to trap mobile robots into a preset position by learning the obstacle-avoidance mechanism from external observation. The salient novelty of our work lies in revealing the possibility that physical-based attacks with intelligent and advanced design can present real threats, while without prior knowledge of the system dynamics or access to the internal system. This kind of attack cannot be handled by countermeasures in traditional cyberspace security. To practice, the cornerstone of the proposed attack is to actively explore the complex interaction characteristic of the victim robot with the environment, and learn the obstacle-avoidance knowledge exhibited in the limited observations of its behaviors. Then, we propose shortest-path and hands-off attack algorithms to find efficient attack paths from the tremendous motion space, achieving the driving-to-trap goal with low costs in terms of path length and activity period, respectively. The convergence of the algorithms is proved and the attack performance bounds are further derived. Extensive simulations and real-life experiments illustrate the effectiveness of the proposed attack, beckoning future investigation for the new physical threats and defense on robotic systems.
AINov 2, 2018
Efficient Metropolitan Traffic Prediction Based on Graph Recurrent Neural NetworkXiaoyu Wang, Cailian Chen, Yang Min et al.
Traffic prediction is a fundamental and vital task in Intelligence Transportation System (ITS), but it is very challenging to get high accuracy while containing low computational complexity due to the spatiotemporal characteristics of traffic flow, especially under the metropolitan circumstances. In this work, a new topological framework, called Linkage Network, is proposed to model the road networks and present the propagation patterns of traffic flow. Based on the Linkage Network model, a novel online predictor, named Graph Recurrent Neural Network (GRNN), is designed to learn the propagation patterns in the graph. It could simultaneously predict traffic flow for all road segments based on the information gathered from the whole graph, which thus reduces the computational complexity significantly from O(nm) to O(n+m), while keeping the high accuracy. Moreover, it can also predict the variations of traffic trends. Experiments based on real-world data demonstrate that the proposed method outperforms the existing prediction methods.
SYSep 14, 2018
Optimal Power Management for Failure Mode of MVDC Microgrids in All-Electric ShipsQimin Xu, Bo Yang, Qiaoni Han et al.
Optimal power management of shipboard power system for failure mode (OPMSF) is a significant and challenging problem considering the safety of system and person. Many existing works focused on the transient-time recovery without consideration of the operating cost and the voyage plan. In this paper, the OPMSF problem is formulated considering the mid-time scheduling and the faults at bus and generator. Two- side adjustment methods including the load shedding and the reconfiguration are coordinated for reducing the fault effects. To address the formulated non-convex problem, the travel equality constraint and fractional energy efficiency operation indicator (EEOI) limitation are transformed into the convex forms. Then, considering the infeasibility scenario affected by faults, a further relaxation is adopted to formulate a new problem with feasibility guaranteed. Furthermore, a sufficient condition is derived to ensure that the new problem has the same optimal solution as the original one. Because of the mixed-integer nonlinear feature, an optimal algorithm based on Benders decomposition (BD) is developed to solve the new one. Due to the slow convergence caused by the time-coupled constraints, a low-complexity near-optimal algorithm based on BD (LNBD) is proposed. The results verify the effectivity of the proposed methods and algorithms.