Yizhe Zhao

CV
h-index8
5papers
31citations
Novelty61%
AI Score50

5 Papers

ITJun 3
Enhanced Fluid Index Modulation for Integrated Data and Energy Transfer

Long Zhang, Yizhe Zhao, Halvin Yang et al.

Integrated data and energy transfer (IDET) is a promising technique for supporting sustainable low-power wireless networks. To improve both communication reliability and energy transfer efficiency, this paper investigates a fluid index modulation (FIM) assisted IDET system, where the base station employs a two-dimensional fluid antenna system (FAS) and the receiver adopts a power-splitting architecture. In FIM, the information bits are delivered not only from the modulation symbols, but also the index of antenna position. Under finite-alphabet signaling, the average harvested power, bit error rate (BER), and achievable data rate are derived in closed form. A joint optimization problem is formulated to maximize the average harvested power subject to BER and achievable rate constraints by jointly optimizing the port selection, precoding vector, and power splitting ratio. An alternating optimization framework is developed, where the precoding vector and port selection are obtained via a Riemannian augmented Lagrangian method (RALM) and block coordinate descent (BCD) algorithm, respectively. Simulation results demonstrate that the proposed scheme achieves a superior rate-energy trade-off over benchmark schemes, while the proposed algorithm attains near-optimal performance with significantly lower complexity than exhaustive search.

CRMar 20Code
MANA: Towards Efficient Mobile Ad Detection via Multimodal Agentic UI Navigation

Yizhe Zhao, Yongjian Fu, Zihao Feng et al.

Mobile advertising dominates app monetization but introduces risks ranging from intrusive user experience to malware delivery. Existing detection methods rely either on static analysis, which misses runtime behaviors, or on heuristic UI exploration, which struggles with sparse and obfuscated ads. In this paper, we present MANA, the first agentic multimodal reasoning framework for mobile ad detection. MANA integrates static, visual, temporal, and experiential signals into a reasoning-guided navigation strategy that determines not only how to traverse interfaces but also where to focus, enabling efficient and robust exploration. We implement and evaluate MANA on commercial smartphones over 200 apps, achieving state-of-the-art accuracy and efficiency. Compared to baselines, it improves detection accuracy by 30.5%-56.3% and reduces exploration steps by 29.7%-63.3%. Case studies further demonstrate its ability to uncover obfuscated and malicious ads, underscoring its practicality for mobile ad auditing and its potential for broader runtime UI analysis (e.g., permission abuse). Code and dataset are available at https://github.com/MANA-2026/MANA.

AIFeb 2
Context Learning for Multi-Agent Discussion

Xingyuan Hua, Sheng Yue, Xinyi Li et al.

Multi-Agent Discussion (MAD) has garnered increasing attention very recently, where multiple LLM instances collaboratively solve problems via structured discussion. However, we find that current MAD methods easily suffer from discussion inconsistency, LLMs fail to reach a coherent solution, due to the misalignment between their individual contexts.In this paper, we introduce a multi-LLM context learning method (M2CL) that learns a context generator for each agent, capable of dynamically generating context instructions per discussion round via automatic information organization and refinement. Specifically, inspired by our theoretical insights on the context instruction, M2CL train the generators to control context coherence and output discrepancies via a carefully crafted self-adaptive mechanism.It enables LLMs to avoid premature convergence on majority noise and progressively reach the correct consensus. We evaluate M2CL on challenging tasks, including academic reasoning, embodied tasks, and mobile control. The results show that the performance of M2CL significantly surpasses existing methods by 20%--50%, while enjoying favorable transferability and computational efficiency.

CVDec 5, 2023
MGTR: Multi-Granular Transformer for Motion Prediction with LiDAR

Yiqian Gan, Hao Xiao, Yizhe Zhao et al.

Motion prediction has been an essential component of autonomous driving systems since it handles highly uncertain and complex scenarios involving moving agents of different types. In this paper, we propose a Multi-Granular TRansformer (MGTR) framework, an encoder-decoder network that exploits context features in different granularities for different kinds of traffic agents. To further enhance MGTR's capabilities, we leverage LiDAR point cloud data by incorporating LiDAR semantic features from an off-the-shelf LiDAR feature extractor. We evaluate MGTR on Waymo Open Dataset motion prediction benchmark and show that the proposed method achieved state-of-the-art performance, ranking 1st on its leaderboard (https://waymo.com/open/challenges/2023/motion-prediction/).

CVMay 17, 2024
DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection

Zhe Huang, Yizhe Zhao, Hao Xiao et al.

Multi-view camera-only 3D object detection largely follows two primary paradigms: exploiting bird's-eye-view (BEV) representations or focusing on perspective-view (PV) features, each with distinct advantages. Although several recent approaches explore combining BEV and PV, many rely on partial fusion or maintain separate detection heads. In this paper, we propose DuoSpaceNet, a novel framework that fully unifies BEV and PV feature spaces within a single detection pipeline for comprehensive 3D perception. Our design includes a decoder to integrate BEV and PV features into unified detection queries, as well as a feature enhancement strategy that enriches different feature representations. In addition, DuoSpaceNet can be extended to handle multi-frame inputs, enabling more robust temporal analysis. Extensive experiments on nuScenes dataset show that DuoSpaceNet surpasses both BEV-based baselines (e.g., BEVFormer) and PV-based baselines (e.g., Sparse4D) in 3D object detection and BEV map segmentation, verifying the effectiveness of our proposed design.