96.9AIJun 3
Agents' Last ExamYiyou Sun, Xinyang Han, Weichen Zhang et al.
Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment across many professional domains. We argue that this gap is largely an evaluation problem: widely used benchmarks lack sustained performance measurement on real and economically valuable workflows. This paper introduces Agents' Last Exam (ALE), a benchmark designed to evaluate AI agents on long-horizon, economically valuable, real-world tasks with verifiable outcomes. Developed in collaboration with 250+ industry experts, ALE covers non-physical industries defined with reference to O*NET / SOC 2018 (the U.S. federal occupational taxonomy). It is organized around a task taxonomy with 55 subfields grouped into 13 industry clusters covering 1K+ tasks. Current results show that the hardest tier remains far from saturated: across mainstream harness and backbone configurations, the average full pass rate is 2.6%. ALE is designed as a living benchmark: its task pool grows continuously as new workflows and industries are onboarded. More broadly, ALE is intended not merely as another leaderboard, but as an instrument for closing the gap between benchmark success and GDP-relevant impact.
CVJan 7Code
Can LLMs See Without Pixels? Benchmarking Spatial Intelligence from Textual DescriptionsZhongbin Guo, Zhen Yang, Yushan Li et al.
Recent advancements in Spatial Intelligence (SI) have predominantly relied on Vision-Language Models (VLMs), yet a critical question remains: does spatial understanding originate from visual encoders or the fundamental reasoning backbone? Inspired by this question, we introduce SiT-Bench, a novel benchmark designed to evaluate the SI performance of Large Language Models (LLMs) without pixel-level input, comprises over 3,800 expert-annotated items across five primary categories and 17 subtasks, ranging from egocentric navigation and perspective transformation to fine-grained robotic manipulation. By converting single/multi-view scenes into high-fidelity, coordinate-aware textual descriptions, we challenge LLMs to perform symbolic textual reasoning rather than visual pattern matching. Evaluation results of state-of-the-art (SOTA) LLMs reveals that while models achieve proficiency in localized semantic tasks, a significant "spatial gap" remains in global consistency. Notably, we find that explicit spatial reasoning significantly boosts performance, suggesting that LLMs possess latent world-modeling potential. Our proposed dataset SiT-Bench serves as a foundational resource to foster the development of spatially-grounded LLM backbones for future VLMs and embodied agents. Our code and benchmark will be released at https://github.com/binisalegend/SiT-Bench .
CVNov 30, 2025Code
LISA-3D: Lifting Language-Image Segmentation to 3D via Multi-View ConsistencyZhongbin Guo, Jiahe Liu, Wenyu Gao et al.
Text-driven 3D reconstruction demands a mask generator that simultaneously understands open-vocabulary instructions and remains consistent across viewpoints. We present LISA-3D, a two-stage framework that lifts language-image segmentation into 3D by retrofitting the instruction-following model LISA with geometry-aware Low-Rank Adaptation (LoRA) layers and reusing a frozen SAM-3D reconstructor. During training we exploit off-the-shelf RGB-D sequences and their camera poses to build a differentiable reprojection loss that enforces cross-view agreement without requiring any additional 3D-text supervision. The resulting masks are concatenated with RGB images to form RGBA prompts for SAM-3D, which outputs Gaussian splats or textured meshes without retraining. Across ScanRefer and Nr3D, LISA-3D improves language-to-3D accuracy by up to +15.6 points over single-view baselines while adapting only 11.6M parameters. The system is modular, data-efficient, and supports zero-shot deployment on unseen categories, providing a practical recipe for language-guided 3D content creation. Our code will be available at https://github.com/binisalegend/LISA-3D.
28.0SYMay 15
Preserving Topology Privacy of Network Systems by Feedback: Conditions and Distributed DesignYushan Li, Jiabao He, Julien M. Hendrickx et al.
This paper develops a feedback-based method to preserve the topology privacy of consensus protocols in network systems. The key idea is to intentionally violate topology identifiability conditions, thereby preventing unique or accurate recovery of the true topology from available observations, while preserving the intended consensus behavior. This problem is challenging because the feedback magnitude directly reflects the privacy level of edges, while it is strongly coupled with the consensus convergence and constrained by local communications at each node. To begin with, we derive the feedback conditions of both partial and full observation cases, where the topology unsolvability from observation data is characterized in the former, and the solution space that enforces topology inaccuracy from data is constructed in the latter. Then, we propose a novel distributed topology modification design under limited privacy budgets, and establish the performance guarantees through a controllable tradeoff between the consensus deviation and the topology privacy. Finally, we develop a low-complexity heuristic algorithm to achieve optimal privacy preservation on existing edges. Comparative simulations validate the effectiveness and outperformance of the proposed preservation design.
CVNov 14, 2025
Beyond Flatlands: Unlocking Spatial Intelligence by Decoupling 3D Reasoning from Numerical RegressionZhongbin Guo, Jiahe Liu, Yushan Li et al.
Existing Vision Language Models (VLMs) architecturally rooted in "flatland" perception, fundamentally struggle to comprehend real-world 3D spatial intelligence. This failure stems from a dual-bottleneck: input-stage conflict between computationally exorbitant geometric-aware encoders and superficial 2D-only features, and output-stage misalignment where discrete tokenizers are structurally incapable of producing precise, continuous numerical values. To break this impasse, we introduce GEODE (Geometric-Output and Decoupled-Input Engine), a novel architecture that resolves this dual-bottleneck by decoupling 3D reasoning from numerical generation. GEODE augments main VLM with two specialized, plug-and-play modules: Decoupled Rationale Module (DRM) that acts as spatial co-processor, aligning explicit 3D data with 2D visual features via cross-attention and distilling spatial Chain-of-Thought (CoT) logic into injectable Rationale Tokens; and Direct Regression Head (DRH), an "Embedding-as-Value" paradigm which routes specialized control tokens to a lightweight MLP for precise, continuous regression of scalars and 3D bounding boxes. The synergy of these modules allows our 1.5B parameter model to function as a high-level semantic dispatcher, achieving state-of-the-art spatial reasoning performance that rivals 7B+ models.
IMApr 10, 2025
Adaptive Detection of Fast Moving Celestial Objects Using a Mixture of Experts and Physical-Inspired Neural NetworkPeng Jia, Ge Li, Bafeng Cheng et al.
Fast moving celestial objects are characterized by velocities across the celestial sphere that significantly differ from the motions of background stars. In observational images, these objects exhibit distinct shapes, contrasting with the typical appearances of stars. Depending on the observational method employed, these celestial entities may be designated as near-Earth objects or asteroids. Historically, fast moving celestial objects have been observed using ground-based telescopes, where the relative stability of stars and Earth facilitated effective image differencing techniques alongside traditional fast moving celestial object detection and classification algorithms. However, the growing prevalence of space-based telescopes, along with their diverse observational modes, produces images with different properties, rendering conventional methods less effective. This paper presents a novel algorithm for detecting fast moving celestial objects within star fields. Our approach enhances state-of-the-art fast moving celestial object detection neural networks by transforming them into physical-inspired neural networks. These neural networks leverage the point spread function of the telescope and the specific observational mode as prior information; they can directly identify moving fast moving celestial objects within star fields without requiring additional training, thereby addressing the limitations of traditional techniques. Additionally, all neural networks are integrated using the mixture of experts technique, forming a comprehensive fast moving celestial object detection algorithm. We have evaluated our algorithm using simulated observational data that mimics various observations carried out by space based telescope scenarios and real observation images. Results demonstrate that our method effectively detects fast moving celestial objects across different observational modes.
IMMar 15, 2024
A Data-Driven Approach for Mitigating Dark Current Noise and Bad Pixels in Complementary Metal Oxide Semiconductor Cameras for Space-based TelescopesPeng Jia, Chao Lv, Yushan Li et al.
In recent years, there has been a gradual increase in the performance of Complementary Metal Oxide Semiconductor (CMOS) cameras. These cameras have gained popularity as a viable alternative to charge-coupled device (CCD) cameras in a wide range of applications. One particular application is the CMOS camera installed in small space telescopes. However, the limited power and spatial resources available on satellites present challenges in maintaining ideal observation conditions, including temperature and radiation environment. Consequently, images captured by CMOS cameras are susceptible to issues such as dark current noise and defective pixels. In this paper, we introduce a data-driven framework for mitigating dark current noise and bad pixels for CMOS cameras. Our approach involves two key steps: pixel clustering and function fitting. During pixel clustering step, we identify and group pixels exhibiting similar dark current noise properties. Subsequently, in the function fitting step, we formulate functions that capture the relationship between dark current and temperature, as dictated by the Arrhenius law. Our framework leverages ground-based test data to establish distinct temperature-dark current relations for pixels within different clusters. The cluster results could then be utilized to estimate the dark current noise level and detect bad pixels from real observational data. To assess the effectiveness of our approach, we have conducted tests using real observation data obtained from the Yangwang-1 satellite, equipped with a near-ultraviolet telescope and an optical telescope. The results show a considerable improvement in the detection efficiency of space-based telescopes.
ROOct 14, 2019
Intelligent Physical Attack Against Mobile Robots With Obstacle-AvoidanceYushan Li, Jianping He, Cailian Chen et al.
The security issue of mobile robots has attracted considerable attention in recent years. In this paper, we propose an intelligent physical attack to trap mobile robots into a preset position by learning the obstacle-avoidance mechanism from external observation. The salient novelty of our work lies in revealing the possibility that physical-based attacks with intelligent and advanced design can present real threats, while without prior knowledge of the system dynamics or access to the internal system. This kind of attack cannot be handled by countermeasures in traditional cyberspace security. To practice, the cornerstone of the proposed attack is to actively explore the complex interaction characteristic of the victim robot with the environment, and learn the obstacle-avoidance knowledge exhibited in the limited observations of its behaviors. Then, we propose shortest-path and hands-off attack algorithms to find efficient attack paths from the tremendous motion space, achieving the driving-to-trap goal with low costs in terms of path length and activity period, respectively. The convergence of the algorithms is proved and the attack performance bounds are further derived. Extensive simulations and real-life experiments illustrate the effectiveness of the proposed attack, beckoning future investigation for the new physical threats and defense on robotic systems.