95.1CRMar 30
Evaluating Privilege Usage of Agents on Real-World ToolsQuan Zhang, Lianhang Fu, Lvsi Lian et al.
Equipping LLM agents with real-world tools can substantially improve productivity. However, granting agents autonomy over tool use also transfers the associated privileges to both the agent and the underlying LLM. Improper privilege usage may lead to serious consequences, including information leakage and infrastructure damage. While several benchmarks have been built to study agents' security, they often rely on pre-coded tools and restricted interaction patterns. Such crafted environments differ substantially from the real-world, making it hard to assess agents' security capabilities in critical privilege control and usage. Therefore, we propose GrantBox, a security evaluation sandbox for analyzing agent privilege usage. GrantBox automatically integrates real-world tools and allows LLM agents to invoke genuine privileges, enabling the evaluation of privilege usage under prompt injection attacks. Our results indicate that while LLMs exhibit basic security awareness and can block some direct attacks, they remain vulnerable to more sophisticated attacks, resulting in an average attack success rate of 84.80% in carefully crafted scenarios.
68.1AIApr 6
EAGLE: Edge-Aware Graph Learning for Proactive Delivery Delay Prediction in Smart Logistics NetworksZhiming Xue, Menghao Huo, Yujue Wang
Modern logistics networks generate rich operational data streams at every warehouse node and transportation lane -- from order timestamps and routing records to shipping manifests -- yet predicting delivery delays remains predominantly reactive. Existing predictive approaches typically treat this problem either as a tabular classification task, ignoring network topology, or as a time-series anomaly detection task, overlooking the spatial dependencies of the supply chain graph. To bridge this gap, we propose a hybrid deep learning framework for proactive supply chain risk management. The proposed method jointly models temporal order-flow dynamics via a lightweight Transformer patch encoder and inter-hub relational dependencies through an Edge-Aware Graph Attention Network (E-GAT), optimized via a multi-task learning objective. Evaluated on the real-world DataCo Smart Supply Chain dataset, our framework achieves consistent improvements over baseline methods, yielding an F1-score of 0.8762 and an AUC-ROC of 0.9773. Across four independent random seeds, the framework exhibits a cross-seed F1 standard deviation of only 0.0089 -- a 3.8 times improvement over the best ablated variant -- achieving the strongest balance of predictive accuracy and training stability among all evaluated models.
AIMar 5
LLM-Grounded Explainability for Port Congestion Prediction via Temporal Graph Attention NetworksZhiming Xue, Yujue Wang
Port congestion at major maritime hubs disrupts global supply chains, yet existing prediction systems typically prioritize forecasting accuracy without providing operationally interpretable explanations. This paper proposes AIS-TGNN, an evidence-grounded framework that jointly performs congestion-escalation prediction and faithful natural-language explanation by coupling a Temporal Graph Attention Network (TGAT) with a structured large language model (LLM) reasoning module. Daily spatial graphs are constructed from Automatic Identification System (AIS) broadcasts, where each grid cell represents localized vessel activity and inter-cell interactions are modeled through attention-based message passing. The TGAT predictor captures spatiotemporal congestion dynamics, while model-internal evidence, including feature z-scores and attention-derived neighbor influence, is transformed into structured prompts that constrain LLM reasoning to verifiable model outputs. To evaluate explanatory reliability, we introduce a directional-consistency validation protocol that quantitatively measures agreement between generated narratives and underlying statistical evidence. Experiments on six months of AIS data from the Port of Los Angeles and Long Beach demonstrate that the proposed framework outperforms both LR and GCN baselines, achieving a test AUC of 0.761, AP of 0.344, and recall of 0.504 under a strict chronological split while producing explanations with 99.6% directional consistency. Results show that grounding LLM generation in graph-model evidence enables interpretable and auditable risk reporting without sacrificing predictive performance. The framework provides a practical pathway toward operationally deployable explainable AI for maritime congestion monitoring and supply-chain risk management.
CRJun 26, 2019
MagneticSpy: Exploiting Magnetometer in Mobile Devices for Website and Application FingerprintingNikolay Matyunin, Yujue Wang, Tolga Arul et al.
Recent studies have shown that aggregate CPU usage and power consumption traces on smartphones can leak information about applications running on the system or websites visited. In response, access to such data has been blocked for mobile applications starting from Android 8. In this work, we explore a new source of side-channel leakage for this class of attacks. Our method is based on the fact that electromagnetic activity caused by mobile processors leads to noticeable disturbances in magnetic sensor measurements on mobile devices, with the amplitude being proportional to the CPU workload. Therefore, recorded sensor data can be analyzed to reveal information about ongoing activities. The attack works on a number of devices: we evaluated 80 models of modern smartphones and tablets and observed the reaction of the magnetometer to the CPU activity on 56 of them. On selected devices we were able to successfully identify which application has been opened (with up to 90% accuracy) or which web page has been loaded (up to 91% accuracy). The presented side channel poses a significant risk to end users' privacy, as the sensor data can be recorded from native apps or even from web pages without user permissions. Finally, we discuss possible countermeasures to prevent the presented information leakage.