NIMay 21
SCALE: Sensitivity-Aware Federated Unlearning with Information Freshness Optimization for Mobile Edge ComputingZihao Ding, Beining Wu, Jun Huang
Federated Unlearning (FU) is emerging as a powerful tool that enables the selective removal of client data to effectively address data contamination and meet strict privacy regulations in mobile edge computing (MEC) systems. Although FU has recently drawn attention in the AI community, existing approaches suffer from low unlearning precision and lack temporal information reflection, which results in suboptimal forgetting performance. To address these issues, we propose SCALE, a dual-level unlearning framework combining historical contribution analysis with information freshness-aware adaptive sparsification. Our framework first employs a historical contribution-based layer sensitivity analysis to identify layers most influenced by target clients, then performs fine-grained unlearning through adaptive sparsification at the weight sub-group level to balance information freshness with forgetting effectiveness. Through theoretical analysis, the proposed framework demonstrates the convergence properties and acceleration advantages. Our experiments and testbed results demonstrate superior unlearning effectiveness compared to state-of-the-art baselines, with significantly improved forgetting performance.
MMJun 3, 2025Code
EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VRZihao Ding, Cheng-Tse Lee, Mufeng Zhu et al.
3D Gaussian Splatting (3DGS) is an emerging media representation that reconstructs real-world 3D scenes in high fidelity, enabling 6-degrees-of-freedom (6-DoF) navigation in virtual reality (VR). However, developing and evaluating 3DGS-enabled applications and optimizing their rendering performance, require realistic user navigation data. Such data is currently unavailable for photorealistic 3DGS reconstructions of real-world scenes. This paper introduces EyeNavGS (EyeNavGS), the first publicly available 6-DoF navigation dataset featuring traces from 46 participants exploring twelve diverse, real-world 3DGS scenes. The dataset was collected at two sites, using the Meta Quest Pro headsets, recording the head pose and eye gaze data for each rendered frame during free world standing 6-DoF navigation. For each of the twelve scenes, we performed careful scene initialization to correct for scene tilt and scale, ensuring a perceptually-comfortable VR experience. We also release our open-source SIBR viewer software fork with record-and-replay functionalities and a suite of utility tools for data processing, conversion, and visualization. The EyeNavGS dataset and its accompanying software tools provide valuable resources for advancing research in 6-DoF viewport prediction, adaptive streaming, 3D saliency, and foveated rendering for 3DGS scenes. The EyeNavGS dataset is available at: https://symmru.github.io/EyeNavGS/.
DCDec 10, 2025
A Distributed Framework for Privacy-Enhanced Vision Transformers on the EdgeZihao Ding, Mufeng Zhu, Zhongze Tang et al.
Nowadays, visual intelligence tools have become ubiquitous, offering all kinds of convenience and possibilities. However, these tools have high computational requirements that exceed the capabilities of resource-constrained mobile and wearable devices. While offloading visual data to the cloud is a common solution, it introduces significant privacy vulnerabilities during transmission and server-side computation. To address this, we propose a novel distributed, hierarchical offloading framework for Vision Transformers (ViTs) that addresses these privacy challenges by design. Our approach uses a local trusted edge device, such as a mobile phone or an Nvidia Jetson, as the edge orchestrator. This orchestrator partitions the user's visual data into smaller portions and distributes them across multiple independent cloud servers. By design, no single external server possesses the complete image, preventing comprehensive data reconstruction. The final data merging and aggregation computation occurs exclusively on the user's trusted edge device. We apply our framework to the Segment Anything Model (SAM) as a practical case study, which demonstrates that our method substantially enhances content privacy over traditional cloud-based approaches. Evaluations show our framework maintains near-baseline segmentation performance while substantially reducing the risk of content reconstruction and user data exposure. Our framework provides a scalable, privacy-preserving solution for vision tasks in the edge-cloud continuum.
ROJan 23
Reinforcement Learning-Based Energy-Aware Coverage Path Planning for Precision AgricultureBeining Wu, Zihao Ding, Leo Ostigaard et al.
Coverage Path Planning (CPP) is a fundamental capability for agricultural robots; however, existing solutions often overlook energy constraints, resulting in incomplete operations in large-scale or resource-limited environments. This paper proposes an energy-aware CPP framework grounded in Soft Actor-Critic (SAC) reinforcement learning, designed for grid-based environments with obstacles and charging stations. To enable robust and adaptive decision-making under energy limitations, the framework integrates Convolutional Neural Networks (CNNs) for spatial feature extraction and Long Short-Term Memory (LSTM) networks for temporal dynamics. A dedicated reward function is designed to jointly optimize coverage efficiency, energy consumption, and return-to-base constraints. Experimental results demonstrate that the proposed approach consistently achieves over 90% coverage while ensuring energy safety, outperforming traditional heuristic algorithms such as Rapidly-exploring Random Tree (RRT), Particle Swarm Optimization (PSO), and Ant Colony Optimization (ACO) baselines by 13.4-19.5% in coverage and reducing constraint violations by 59.9-88.3%. These findings validate the proposed SAC-based framework as an effective and scalable solution for energy-constrained CPP in agricultural robotics.
NIMay 1
EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor ClosureZihao Ding, Beining Wu, Jun Huang
Federated Multimodal Learning (FML) trains multimodal models across decentralized clients while keeping their image-text pairs private. However, joint embedding training entangles forgotten knowledge across both modalities and client gradient subspaces, hindering federated unlearning. Previous federated unlearning approaches neither sever the cross-modal reconstruction channel mediated by bilinear coupling nor separate forget-exclusive update directions from those shared with retained clients. We identify an Anchor Principle for federated multimodal contrastive unlearning: forgotten alignments persist through three residual anchors arising from bilinear cross-modal coupling, principal-angle subspace entanglement, and continued federated updates. At the modality level, we show that bilateral displacement of both visual and language branches closes the cross-modal reconstruction channel. Correspondingly, our method addresses subspace entanglement through Cosine--Sine decomposition of client-update subspaces, isolating forget-exclusive directions from retain support. Moreover, we propose a direction-selective Forget Lock that bounds residual drift across rounds. Combining these strategies, we present EASE, an Entanglement-Aware Subspace Excision framework that closes all three anchor channels under a unified design. EASE demonstrates consistent superiority across multiple datasets and unlearning scenarios, for instance, matching the retrain reference to within 0.2 and 4.2 R@1 points on the forget and retain sides under client unlearning on Flickr30K with CLIP-B/32.
MMMay 1
PRISM: Exposing and Resolving Spurious Isolation in Federated Multimodal Continual LearningBeining Wu, Zihao Ding, Jun Huang
While current federated multimodal continual learning over mixture-of-experts low-rank adaptation (MoE-LoRA) is built on the unverified assumption that routing isolates task-specific knowledge into disjoint experts, we argue that routing operates per-sample, while forgetting accumulates across the task sequence, and gradient conflict persists within each expert even when routing is maximally polarized. Moreover, activation-subspace protection can also fail because, under parameter-efficient fine-tuning, it entangles tasks due to a dimension-counting bound, and federated averaging (FedAvg) disrupts client-side orthogonality. To address this, we propose PRISM (Per-expert Routing-projection Interference-informed Subspace Method), which maintains a per-expert gradient subspace basis whose orthogonality is preserved under FedAvg and reinterprets MoE routing as a capacity allocator. Our results show that, on LLaVA-1.5-7B, LLaVA-1.5-13B, and Qwen2.5-VL-7B across CoIN-6 and CoIN-Long-10, PRISM outperforms sixteen the state of the art baselines in average accuracy. Compared to the best federated multimodal baseline, the performance margin increases from +3.23 pp on CoIN-6 to +6.06 pp on CoIN-Long-10.
NIApr 28
Application-Aware Twin-in-the-Loop Planning for Federated Split Learning over Wireless Edge NetworksZihao Ding, Beining Wu, Jun Huang et al.
We investigate task-success-oriented resource allocation for federated split learning (FSL) at the wireless edge. In this setting, the server must jointly determine bandwidth, transmit power, split-layer placement, compression level, and terminal participation under per-round deadline, memory, and spectrum constraints. These coupled decisions affect wireless transmission, model training, and task execution, which evolve at different time scales and cannot be efficiently evaluated through repeated real-world trials. To address this challenge, we propose TiLP, a twin-in-the-loop planner that evaluates candidate decisions through a cross-domain digital twin before execution. The twin integrates network, training, and task sub-twins, with each sub-twin calibrated at the time scale of the process it models. Based on this twin, TiLP performs receding-horizon cross-entropy method planning with actor-critic guidance to search over mixed continuous-discrete decisions. Experiments on LIBERO robotic manipulation tasks over a Sionna RT-simulated wireless network show that TiLP improves task success by 9.5 percentage points over the strongest single-axis baseline, while satisfying the per-round deadline and energy budget.
NIApr 5
RELIEF: Turning Missing Modalities into Training Acceleration for Federated Learning on Heterogeneous IoT EdgeBeining Wu, Zihao Ding, Jun Huang
Federated learning (FL) over heterogeneous IoT edge devices faces coupled system-modality-data heterogeneity: the lower-cost device carries both fewer sensors and less computational power, so the slowest device (straggler) produces the most incomplete gradient signals. Naively averaging their updates dilutes rare-modality information and wastes computation on absent-sensor parameters, whereas existing methods handle the triple heterogeneity (system, modality, data) in isolation and none addresses their coupling. To resolve this issue, we propose RELIEF, a framework that partitions the fusion-layer Low-Rank Adaptation (LoRA) projection matrix into modality-aligned column blocks and uses this partition as a unified interface for aggregation, elastic training, and communication. Each block is aggregated only within the cohort of devices possessing that modality, which eliminates cross-modal gradient interference; the server then allocates personalized training budgets by prioritizing blocks with the highest cohort-internal divergence, so that resource-constrained devices train fewer but more impactful parameters. We prove that cohort-wise aggregation removes interference from the convergence bound and that the divergence-guided allocation achieves sublinear regret. Experiments on two IoT sensor datasets (PAMAP2, MHEALTH) under both full-parameter (CNN) and parameter-efficient (LoRA) training show that RELIEF achieves up to 9.41x speedup and 37% energy reduction over FedAvg with up to 15.3 pp rare-modality F1 gains, and real-device validation on a two-Jetson AGX Orin testbed confirms these results.
CVNov 19, 2025
D4C: Data-free Quantization for Contrastive Language-Image Pre-training ModelsWenlun Zhang, Yunshan Zhong, Zihao Ding et al.
Data-Free Quantization (DFQ) offers a practical solution for model compression without requiring access to real data, making it particularly attractive in privacy-sensitive scenarios. While DFQ has shown promise for unimodal models, its extension to Vision-Language Models such as Contrastive Language-Image Pre-training (CLIP) models remains underexplored. In this work, we reveal that directly applying existing DFQ techniques to CLIP results in substantial performance degradation due to two key limitations: insufficient semantic content and low intra-image diversity in synthesized samples. To tackle these challenges, we propose D4C, the first DFQ framework tailored for CLIP. D4C synthesizes semantically rich and structurally diverse pseudo images through three key components: (1) Prompt-Guided Semantic Injection aligns generated images with real-world semantics using text prompts; (2) Structural Contrastive Generation reproduces compositional structures of natural images by leveraging foreground-background contrastive synthesis; and (3) Perturbation-Aware Enhancement applies controlled perturbations to improve sample diversity and robustness. These components jointly empower D4C to synthesize images that are both semantically informative and structurally diverse, effectively bridging the performance gap of DFQ on CLIP. Extensive experiments validate the effectiveness of D4C, showing significant performance improvements on various bit-widths and models. For example, under the W4A8 setting with CLIP ResNet-50 and ViT-B/32, D4C achieves Top-1 accuracy improvement of 12.4% and 18.9% on CIFAR-10, 6.8% and 19.7% on CIFAR-100, and 1.4% and 5.7% on ImageNet-1K in zero-shot classification, respectively.
DCOct 20, 2025
Network and Systems Performance Characterization of MCP-Enabled LLM AgentsZihao Ding, Mufeng Zhu, Yao Liu
Model Context Protocol (MCP) has recently gained increased attention within the AI community for providing a standardized way for large language models (LLMs) to interact with external tools and services, significantly enhancing their capabilities. However, the inclusion of extensive contextual information, including system prompts, MCP tool definitions, and context histories, in MCP-enabled LLM interactions, dramatically inflates token usage. Given that LLM providers charge based on tokens, these expanded contexts can quickly escalate monetary costs and increase the computational load on LLM services. This paper presents a comprehensive measurement-based analysis of MCP-enabled interactions with LLMs, revealing trade-offs between capability, performance, and cost. We explore how different LLM models and MCP configurations impact key performance metrics such as token efficiency, monetary cost, task completion times, and task success rates, and suggest potential optimizations, including enabling parallel tool calls and implementing robust task abort mechanisms. These findings provide useful insights for developing more efficient, robust, and cost-effective MCP-enabled workflows.
LGMay 6, 2020
Towards Frequency-Based Explanation for Robust CNNZifan Wang, Yilin Yang, Ankit Shrivastava et al.
Current explanation techniques towards a transparent Convolutional Neural Network (CNN) mainly focuses on building connections between the human-understandable input features with models' prediction, overlooking an alternative representation of the input, the frequency components decomposition. In this work, we present an analysis of the connection between the distribution of frequency components in the input dataset and the reasoning process the model learns from the data. We further provide quantification analysis about the contribution of different frequency components toward the model's prediction. We show that the vulnerability of the model against tiny distortions is a result of the model is relying on the high-frequency features, the target features of the adversarial (black and white-box) attackers, to make the prediction. We further show that if the model develops stronger association between the low-frequency component with true labels, the model is more robust, which is the explanation of why adversarially trained models are more robust against tiny distortions.