Yihang Huang

CR
h-index17
4papers
21citations
Novelty59%
AI Score46

4 Papers

64.3CRMay 29
Free-Riding in the AI Economy: Demystifying Logic Flaws in x402-Enabled Payment Systems

Shengchen Ling, Yihang Huang, Yuan Chen et al.

The agentic economy demands programmatic financial rails, positioning the x402 protocol as the de facto standard for machine-to-machine payments. However, bridging synchronous HTTP requests with asynchronous blockchain finality introduces profound state synchronization challenges. In this work, we perform the first comprehensive security analysis of the x402 ecosystem. By formalizing five Security Invariants, we reveal that current implementations fail to enforce transactional atomicity and cryptographic context binding, leading to systemic vulnerabilities. We identify a semantic gap in signature design enabling cross-resource substitution, where payment proofs are transplanted to other unauthorized contexts. Furthermore, we expose a temporal gap where concurrency race conditions allow probabilistic service duplication. In the AI inference domain, we demonstrate how dynamic pricing models are vulnerable to allowance overdrafts and infrastructure rate limits. We validate these vulnerabilities against official SDKs and live deployments. Specifically, we show that attackers can exploit the synchronization gap in dynamic authorization schemes to force merchants to subsidize compute costs, achieving a resource leakage ratio of up to 100% on production middleware. Finally, we propose architectural mitigations, advocating for request-bound signatures and pessimistic state locking to secure the financial rails of autonomous agents. All discovered issues have been disclosed to Coinbase and ThirdWeb.

NISep 21, 2024
LLM Agents as 6G Orchestrator: A Paradigm for Task-Oriented Physical-Layer Automation

Zhuoran Xiao, Chenhui Ye, Yunbo Hu et al.

The rapid advancement in generative pre-training models is propelling a paradigm shift in technological progression from basic applications such as chatbots towards more sophisticated agent-based systems. It is with huge potential and necessity that the 6G system be combined with the copilot of large language model (LLM) agents and digital twins (DT) to manage the highly complicated communication system with new emerging features such as native AI service and sensing. With the 6G-oriented agent, the base station could understand the transmission requirements of various dynamic upper-layer tasks, automatically orchestrate the optimal system workflow. Through continuously get feedback from the 6G DT for reinforcement, the agents can finally raise the performance of practical system accordingly. Differing from existing LLM agents designed for general application, the 6G-oriented agent aims to make highly rigorous and precise planning with a vast amount of extra expert knowledge, which inevitably requires a specific system design from model training to implementation. This paper proposes a novel comprehensive approach for building task-oriented 6G LLM agents. We first propose a two-stage continual pre-training and fine-tuning scheme to build the field basic model and diversities of specialized expert models for meeting the requirements of various application scenarios. Further, a novel inference framework based on semantic retrieval for leveraging the existing communication-related functions is proposed. Experiment results of exemplary tasks, such as physical-layer task decomposition, show the proposed paradigm's feasibility and effectiveness.

CVAug 4, 2025Code
DeflareMamba: Hierarchical Vision Mamba for Contextually Consistent Lens Flare Removal

Yihang Huang, Yuanfei Huang, Junhui Lin et al.

Lens flare removal remains an information confusion challenge in the underlying image background and the optical flares, due to the complex optical interactions between light sources and camera lens. While recent solutions have shown promise in decoupling the flare corruption from image, they often fail to maintain contextual consistency, leading to incomplete and inconsistent flare removal. To eliminate this limitation, we propose DeflareMamba, which leverages the efficient sequence modeling capabilities of state space models while maintains the ability to capture local-global dependencies. Particularly, we design a hierarchical framework that establishes long-range pixel correlations through varied stride sampling patterns, and utilize local-enhanced state space models that simultaneously preserves local details. To the best of our knowledge, this is the first work that introduces state space models to the flare removal task. Extensive experiments demonstrate that our method effectively removes various types of flare artifacts, including scattering and reflective flares, while maintaining the natural appearance of non-flare regions. Further downstream applications demonstrate the capacity of our method to improve visual object recognition and cross-modal semantic understanding. Code is available at https://github.com/BNU-ERC-ITEA/DeflareMamba.

LGMay 15, 2025
AI2MMUM: AI-AI Oriented Multi-Modal Universal Model Leveraging Telecom Domain Large Model

Tianyu Jiao, Zhuoran Xiao, Yihang Huang et al.

Designing a 6G-oriented universal model capable of processing multi-modal data and executing diverse air interface tasks has emerged as a common goal in future wireless systems. Building on our prior work in communication multi-modal alignment and telecom large language model (LLM), we propose a scalable, task-aware artificial intelligence-air interface multi-modal universal model (AI2MMUM), which flexibility and effectively perform various physical layer tasks according to subtle task instructions. The LLM backbone provides robust contextual comprehension and generalization capabilities, while a fine-tuning approach is adopted to incorporate domain-specific knowledge. To enhance task adaptability, task instructions consist of fixed task keywords and learnable, implicit prefix prompts. Frozen radio modality encoders extract universal representations and adapter layers subsequently bridge radio and language modalities. Moreover, lightweight task-specific heads are designed to directly output task objectives. Comprehensive evaluations demonstrate that AI2MMUM achieves SOTA performance across five representative physical environment/wireless channel-based downstream tasks using the WAIR-D and DeepMIMO datasets.