CROct 16, 2024
CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge DeploymentQinfeng Li, Tianyue Luo, Xuhong Zhang et al.
Proprietary large language models (LLMs) exhibit strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy reasons. However, deploying proprietary LLMs at the edge without adequate protection introduces critical security threats. Attackers can extract model weights and architectures, enabling unauthorized copying and misuse. Even when protective measures prevent full extraction of model weights, attackers may still perform advanced attacks, such as fine-tuning, to further exploit the model. Existing defenses against these threats typically incur significant computational and communication overhead, making them impractical for edge deployment. To safeguard the edge-deployed LLMs, we introduce CoreGuard, a computation- and communication-efficient protection method. CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol. Extensive experiments show that CoreGuard achieves upper-bound security protection with negligible overhead.
CVFeb 27, 2025
Accurate Pose Estimation for Flight Platforms based on Divergent Multi-Aperture Imaging SystemShunkun Liang, Bin Li, Banglei Guan et al.
Vision-based pose estimation plays a crucial role in the autonomous navigation of flight platforms. However, the field of view and spatial resolution of the camera limit pose estimation accuracy. This paper designs a divergent multi-aperture imaging system (DMAIS), equivalent to a single imaging system to achieve simultaneous observation of a large field of view and high spatial resolution. The DMAIS overcomes traditional observation limitations, allowing accurate pose estimation for the flight platform. {Before conducting pose estimation, the DMAIS must be calibrated. To this end we propose a calibration method for DMAIS based on the 3D calibration field.} The calibration process determines the imaging parameters of the DMAIS, which allows us to model DMAIS as a generalized camera. Subsequently, a new algorithm for accurately determining the pose of flight platform is introduced. We transform the absolute pose estimation problem into a nonlinear minimization problem. New optimality conditions are established for solving this problem based on Lagrange multipliers. Finally, real calibration experiments show the effectiveness and accuracy of the proposed method. Results from real flight experiments validate the system's ability to achieve centimeter-level positioning accuracy and arc-minute-level orientation accuracy.
CVDec 17, 2025
Step-GUI Technical ReportHaolong Yan, Jia Wang, Xin Huang et al.
Recent advances in multimodal large language models unlock unprecedented opportunities for GUI automation. However, a fundamental challenge remains: how to efficiently acquire high-quality training data while maintaining annotation reliability? We introduce a self-evolving training pipeline powered by the Calibrated Step Reward System, which converts model-generated trajectories into reliable training signals through trajectory-level calibration, achieving >90% annotation accuracy with 10-100x lower cost. Leveraging this pipeline, we introduce Step-GUI, a family of models (4B/8B) that achieves state-of-the-art GUI performance (8B: 80.2% AndroidWorld, 48.5% OSWorld, 62.6% ScreenShot-Pro) while maintaining robust general capabilities. As GUI agent capabilities improve, practical deployment demands standardized interfaces across heterogeneous devices while protecting user privacy. To this end, we propose GUI-MCP, the first Model Context Protocol for GUI automation with hierarchical architecture that combines low-level atomic operations and high-level task delegation to local specialist models, enabling high-privacy execution where sensitive data stays on-device. Finally, to assess whether agents can handle authentic everyday usage, we introduce AndroidDaily, a benchmark grounded in real-world mobile usage patterns with 3146 static actions and 235 end-to-end tasks across high-frequency daily scenarios (8B: static 89.91%, end-to-end 52.50%). Our work advances the development of practical GUI agents and demonstrates strong potential for real-world deployment in everyday digital interactions.