Victor C. M. Leung

CV
h-index71
42papers
2,160citations
Novelty34%
AI Score52

42 Papers

AINov 29, 2022
When Quantum Information Technologies Meet Blockchain in Web 3.0

Minrui Xu, Xiaoxu Ren, Dusit Niyato et al.

With the drive to create a decentralized digital economy, Web 3.0 has become a cornerstone of digital transformation, developed on the basis of computing-force networking, distributed data storage, and blockchain. With the rapid realization of quantum devices, Web 3.0 is being developed in parallel with the deployment of quantum cloud computing and quantum Internet. In this regard, quantum computing first disrupts the original cryptographic systems that protect data security while reshaping modern cryptography with the advantages of quantum computing and communication. Therefore, in this paper, we introduce a quantum blockchain-driven Web 3.0 framework that provides information-theoretic security for decentralized data transferring and payment transactions. First, we present the framework of quantum blockchain-driven Web 3.0 with future-proof security during the transmission of data and transaction information. Next, we discuss the potential applications and challenges of implementing quantum blockchain in Web 3.0. Finally, we describe a use case for quantum non-fungible tokens (NFTs) and propose a quantum deep learning-based optimal auction for NFT trading to maximize the achievable revenue for sufficient liquidity in Web 3.0. In this way, the proposed framework can achieve proven security and sustainability for the next-generation decentralized digital society.

NIMar 5, 2022
AI-aided Traffic Control Scheme for M2M Communications in the Internet of Vehicles

Haijun Zhang, Minghui Jiang, Xiangnan Liu et al.

Due to the rapid growth of data transmissions in internet of vehicles (IoV), finding schemes that can effectively alleviate access congestion has become an important issue. Recently, many traffic control schemes have been studied. Nevertheless, the dynamics of traffic and the heterogeneous requirements of different IoV applications are not considered in most existing studies, which is significant for the random access resource allocation. In this paper, we consider a hybrid traffic control scheme and use proximal policy optimization (PPO) method to tackle it. Firstly, IoV devices are divided into various classes based on delay characteristics. The target of maximizing the successful transmission of packets with the success rate constraint is established. Then, the optimization objective is transformed into a markov decision process (MDP) model. Finally, the access class barring (ACB) factors are obtained based on the PPO method to maximize the number of successful access devices. The performance of the proposal algorithm in respect of successful events and delay compared to existing schemes is verified by simulations.

CVApr 5, 2023
SCMM: Calibrating Cross-modal Representations for Text-Based Person Search

Jing Liu, Donglai Wei, Yang Liu et al.

Text-Based Person Search (TBPS) aims to retrieve target person images from a large-scale gallery using natural language descriptions, posing fundamental challenges in cross-modal representation learning. Existing methods often struggle to bridge the semantic gap between heterogeneous modalities while capturing fine-grained correspondences essential for discriminating visually similar individuals. To address these challenges, we propose Sew Calibration and Masked Modeling (SCMM), a unified framework that calibrates cross-modal representations through complementary learning mechanisms. Notably, SCMM introduces two novel components: a sew calibration loss that dynamically aligns image-text features using quality-guided adaptive margins based on textual information density, and a masked caption modeling loss that establishes fine-grained cross-modal correspondences through transformer-based masked prediction. Additionally, the sew calibration mechanism implements bidirectional constraints to effectively compress same-identity features in the shared embedding space, while the masked modeling component leverages a cross-modal decoder to learn word-level visual-textual relationships, enabling discrimination of subtle attribute differences. Our dual-encoder architecture achieves an effective balance between representation capability and computational efficiency by employing a training-only decoder design. Extensive experiments on CUHK-PEDES, ICFG-PEDES, and RSTPReID benchmarks demonstrate that SCMM achieves state-of-the-art performance with Rank1 accuracies of 73.81%, 64.25%, and 57.35%, respectively. Comprehensive ablation studies validate the effectiveness of each proposed component.

LGMar 31, 2023
Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis

Yanjie Dong, Luya Wang, Yuanfang Chi et al.

A wireless federated learning system is investigated by allowing a server and workers to exchange uncoded information via orthogonal wireless channels. Since the workers frequently upload local gradients to the server via bandwidth-limited channels, the uplink transmission from the workers to the server becomes a communication bottleneck. Therefore, a one-shot distributed principle component analysis (PCA) is leveraged to reduce the dimension of uploaded gradients such that the communication bottleneck is relieved. A PCA-based wireless federated learning (PCA-WFL) algorithm and its accelerated version (i.e., PCA-AWFL) are proposed based on the low-dimensional gradients and the Nesterov's momentum. For the non-convex loss functions, a finite-time analysis is performed to quantify the impacts of system hyper-parameters on the convergence of the PCA-WFL and PCA-AWFL algorithms. The PCA-AWFL algorithm is theoretically certified to converge faster than the PCA-WFL algorithm. Besides, the convergence rates of PCA-WFL and PCA-AWFL algorithms quantitatively reveal the linear speedup with respect to the number of workers over the vanilla gradient descent algorithm. Numerical results are used to demonstrate the improved convergence rates of the proposed PCA-WFL and PCA-AWFL algorithms over the benchmarks.

AIAug 20, 2024
Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

Yanjie Dong, Haijun Zhang, Chengming Li et al.

Since the release of GPT2-1.5B in 2019, the large language models (LLMs) have evolved from specialized deep models to versatile foundation models. While demonstrating remarkable zero-shot ability, the LLMs still require fine-tuning on local datasets and substantial memory for deployment over the network edges. Traditional first-order fine-tuning techniques require significant GPU memory that exceeds the capacity of mainstream hardware. Besides, the LLMs have been expanded beyond text generation to create images, audio, video, and multi-modal content, necessitating careful investigation of efficient deployment strategies for large-scale foundation models. In response to these challenges, model fine-tuning and model-compression techniques have been developed to support the sustainable growth of LLMs by reducing both operational and capital expenditures. In this work, we provide a comprehensive overview of prevalent memory-efficient fine-tuning methods for deployment at the network edge. We also review state-of-the-art literature on model compression, offering insights into the deployment of LLMs at network edges.

CVMay 16, 2024Code
Networking Systems for Video Anomaly Detection: A Tutorial and Survey

Jing Liu, Yang Liu, Jieyu Lin et al.

The increasing utilization of surveillance cameras in smart cities, coupled with the surge of online video applications, has heightened concerns regarding public security and privacy protection, which propelled automated Video Anomaly Detection (VAD) into a fundamental research task within the Artificial Intelligence (AI) community. With the advancements in deep learning and edge computing, VAD has made significant progress and advances synergized with emerging applications in smart cities and video internet, which has moved beyond the conventional research scope of algorithm engineering to deployable Networking Systems for VAD (NSVAD), a practical hotspot for intersection exploration in the AI, IoVT, and computing fields. In this article, we delineate the foundational assumptions, learning frameworks, and applicable scenarios of various deep learning-driven VAD routes, offering an exhaustive tutorial for novices in NSVAD. In addition, this article elucidates core concepts by reviewing recent advances and typical solutions and aggregating available research resources accessible at https://github.com/fdjingliu/NSVAD. Lastly, this article projects future development trends and discusses how the integration of AI and computing technologies can address existing research challenges and promote open opportunities, serving as an insightful guide for prospective researchers and engineers.

LGJan 20, 2025Code
A Survey on Diffusion Models for Anomaly Detection

Jing Liu, Zhenchao Ma, Zepu Wang et al.

Diffusion models (DMs) have emerged as a powerful class of generative AI models, showing remarkable potential in anomaly detection (AD) tasks across various domains, such as cybersecurity, fraud detection, healthcare, and manufacturing. The intersection of these two fields, termed diffusion models for anomaly detection (DMAD), offers promising solutions for identifying deviations in increasingly complex and high-dimensional data. In this survey, we review recent advances in DMAD research. We begin by presenting the fundamental concepts of AD and DMs, followed by a comprehensive analysis of classic DM architectures including DDPMs, DDIMs, and Score SDEs. We further categorize existing DMAD methods into reconstruction-based, density-based, and hybrid approaches, providing detailed examinations of their methodological innovations. We also explore the diverse tasks across different data modalities, encompassing image, time series, video, and multimodal data analysis. Furthermore, we discuss critical challenges and emerging research directions, including computational efficiency, model interpretability, robustness enhancement, edge-cloud collaboration, and integration with large language models. The collection of DMAD research papers and resources is available at https://github.com/fdjingliu/DMAD.

CRNov 22, 2023
A Survey of Adversarial CAPTCHAs on its History, Classification and Generation

Zisheng Xu, Qiao Yan, F. Richard Yu et al.

Completely Automated Public Turing test to tell Computers and Humans Apart, short for CAPTCHA, is an essential and relatively easy way to defend against malicious attacks implemented by bots. The security and usability trade-off limits the use of massive geometric transformations to interfere deep model recognition and deep models even outperformed humans in complex CAPTCHAs. The discovery of adversarial examples provides an ideal solution to the security and usability trade-off by integrating adversarial examples and CAPTCHAs to generate adversarial CAPTCHAs that can fool the deep models. In this paper, we extend the definition of adversarial CAPTCHAs and propose a classification method for adversarial CAPTCHAs. Then we systematically review some commonly used methods to generate adversarial examples and methods that are successfully used to generate adversarial CAPTCHAs. Also, we analyze some defense methods that can be used to defend adversarial CAPTCHAs, indicating potential threats to adversarial CAPTCHAs. Finally, we discuss some possible future research directions for adversarial CAPTCHAs at the end of this paper.

CVJan 13
M3SR: Multi-Scale Multi-Perceptual Mamba for Efficient Spectral Reconstruction

Yuze Zhang, Lingjie Li, Qiuzhen Lin et al.

The Mamba architecture has been widely applied to various low-level vision tasks due to its exceptional adaptability and strong performance. Although the Mamba architecture has been adopted for spectral reconstruction, it still faces the following two challenges: (1) Single spatial perception limits the ability to fully understand and analyze hyperspectral images; (2) Single-scale feature extraction struggles to capture the complex structures and fine details present in hyperspectral images. To address these issues, we propose a multi-scale, multi-perceptual Mamba architecture for the spectral reconstruction task, called M3SR. Specifically, we design a multi-perceptual fusion block to enhance the ability of the model to comprehensively understand and analyze the input features. By integrating the multi-perceptual fusion block into a U-Net structure, M3SR can effectively extract and fuse global, intermediate, and local features, thereby enabling accurate reconstruction of hyperspectral images at multiple scales. Extensive quantitative and qualitative experiments demonstrate that the proposed M3SR outperforms existing state-of-the-art methods while incurring a lower computational cost.

GTFeb 3
Toward a Sustainable Federated Learning Ecosystem: A Practical Least Core Mechanism for Payoff Allocation

Zhengwei Ni, Zhidu Li, Wei Chen et al.

Emerging network paradigms and applications increasingly rely on federated learning (FL) to enable collaborative intelligence while preserving privacy. However, the sustainability of such collaborative environments hinges on a fair and stable payoff allocation mechanism. Focusing on coalition stability, this paper introduces a payoff allocation framework based on the least core (LC) concept. Unlike traditional methods, the LC prioritizes the cohesion of the federation by minimizing the maximum dissatisfaction among all potential subgroups, ensuring that no participant has an incentive to break away. To adapt this game-theoretic concept to practical, large-scale networks, we propose a streamlined implementation with a stack-based pruning algorithm, effectively balancing computational efficiency with allocation precision. Case studies in federated intrusion detection demonstrate that our mechanism correctly identifies pivotal contributors and strategic alliances. The results confirm that the practical LC framework promotes stable collaboration and fosters a sustainable FL ecosystem.

CVMar 19
Diffusion-Guided Semantic Consistency for Multimodal Heterogeneity

Jing Liu, Zhengliang Guo, Yan Wang et al.

Federated learning (FL) is severely challenged by non-independent and identically distributed (non-IID) client data, a problem that degrades global model performance, especially in multimodal perception settings. Conventional methods often fail to address the underlying semantic discrepancies between clients, leading to suboptimal performance for multimedia systems requiring robust perception. To overcome this, we introduce SemanticFL, a novel framework that leverages the rich semantic representations of pre-trained diffusion models to provide privacy-preserving guidance for local training. Our approach leverages multi-layer semantic representations from a pre-trained Stable Diffusion model (including VAE-encoded latents and U-Net hierarchical features) to create a shared latent space that aligns heterogeneous clients, facilitated by an efficient client-server architecture that offloads heavy computation to the server. A unified consistency mechanism, employing cross-modal contrastive learning, further stabilizes convergence. We conduct extensive experiments on benchmarks including CIFAR-10, CIFAR-100, and TinyImageNet under diverse heterogeneity scenarios. Our results demonstrate that SemanticFL surpasses existing federated learning approaches, achieving accuracy gains of up to 5.49% over FedAvg, validating its effectiveness in learning robust representations for heterogeneous and multimodal data for perception tasks.

LGOct 31, 2025
FedSM: Robust Semantics-Guided Feature Mixup for Bias Reduction in Federated Learning with Long-Tail Data

Jingrui Zhang, Yimeng Xu, Shujie Li et al.

Federated Learning (FL) enables collaborative model training across decentralized clients without sharing private data. However, FL suffers from biased global models due to non-IID and long-tail data distributions. We propose \textbf{FedSM}, a novel client-centric framework that mitigates this bias through semantics-guided feature mixup and lightweight classifier retraining. FedSM uses a pretrained image-text-aligned model to compute category-level semantic relevance, guiding the category selection of local features to mix-up with global prototypes to generate class-consistent pseudo-features. These features correct classifier bias, especially when data are heavily skewed. To address the concern of potential domain shift between the pretrained model and the data, we propose probabilistic category selection, enhancing feature diversity to effectively mitigate biases. All computations are performed locally, requiring minimal server overhead. Extensive experiments on long-tail datasets with various imbalanced levels demonstrate that FedSM consistently outperforms state-of-the-art methods in accuracy, with high robustness to domain shift and computational efficiency.

AIMar 17
Proactive Rejection and Grounded Execution: A Dual-Stage Intent Analysis Paradigm for Safe and Efficient AIoT Smart Homes

Xinxin Jin, Zhengwei Ni, Zhengguo Sheng et al.

As Large Language Models (LLMs) transition from information providers to embodied agents in the Internet of Things (IoT), they face significant challenges regarding reliability and interaction efficiency. Direct execution of LLM-generated commands often leads to entity hallucinations (e.g., trying to control non-existent devices). Meanwhile, existing iterative frameworks (e.g., SAGE) suffer from the Interaction Frequency Dilemma, oscillating between reckless execution and excessive user questioning. To address these issues, we propose a Dual-Stage Intent-Aware (DS-IA) Framework. This framework separates high-level user intent understanding from low-level physical execution. Specifically, Stage 1 serves as a semantic firewall to filter out invalid instructions and resolve vague commands by checking the current state of the home. Stage 2 then employs a deterministic cascade verifier-a strict, step-by-step rule checker that verifies the room, device, and capability in sequence-to ensure the action is actually physically possible before execution. Extensive experiments on the HomeBench and SAGE benchmarks demonstrate that DS-IA achieves an Exact Match (EM) rate of 58.56% (outperforming baselines by over 28%) and improves the rejection rate of invalid instructions to 87.04%. Evaluations on the SAGE benchmark further reveal that DS-IA resolves the Interaction Frequency Dilemma by balancing proactive querying with state-based inference. Specifically, it boosts the Autonomous Success Rate (resolving tasks without unnecessary user intervention) from 42.86% to 71.43%, while maintaining high precision in identifying irreducible ambiguities that truly necessitate human clarification. These results underscore the framework's ability to minimize user disturbance through accurate environmental grounding.

CVJun 14, 2025Code
Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation

Runhao Zeng, Qi Deng, Ronghao Zhang et al.

Test-time adaptation (TTA) aims to boost the generalization capability of a trained model by conducting self-/unsupervised learning during the testing phase. While most existing TTA methods for video primarily utilize visual supervisory signals, they often overlook the potential contribution of inherent audio data. To address this gap, we propose a novel approach that incorporates audio information into video TTA. Our method capitalizes on the rich semantic content of audio to generate audio-assisted pseudo-labels, a new concept in the context of video TTA. Specifically, we propose an audio-to-video label mapping method by first employing pre-trained audio models to classify audio signals extracted from videos and then mapping the audio-based predictions to video label spaces through large language models, thereby establishing a connection between the audio categories and video labels. To effectively leverage the generated pseudo-labels, we present a flexible adaptation cycle that determines the optimal number of adaptation iterations for each sample, based on changes in loss and consistency across different views. This enables a customized adaptation process for each sample. Experimental results on two widely used datasets (UCF101-C and Kinetics-Sounds-C), as well as on two newly constructed audio-video TTA datasets (AVE-C and AVMIT-C) with various corruption types, demonstrate the superiority of our approach. Our method consistently improves adaptation performance across different video classification models and represents a significant step forward in integrating audio information into video TTA. Code: https://github.com/keikeiqi/Audio-Assisted-TTA.

NINov 3, 2025
Pinching Antennas Meet AI in Next-Generation Wireless Networks

Fang Fang, Zhiguo Ding, Victor C. M. Leung et al.

Next-generation (NG) wireless networks must embrace innate intelligence in support of demanding emerging applications, such as extended reality and autonomous systems, under ultra-reliable and low-latency requirements. Pinching antennas (PAs), a new flexible low-cost technology, can create line-of-sight links by dynamically activating small dielectric pinches along a waveguide on demand. As a compelling complement, artificial intelligence (AI) offers the intelligence needed to manage the complex control of PA activation positions and resource allocation in these dynamic environments. This article explores the "win-win" cooperation between AI and PAs: AI facilitates the adaptive optimization of PA activation positions along the waveguide, while PAs support edge AI tasks such as federated learning and over-the-air aggregation. We also discuss promising research directions including large language model-driven PA control frameworks, and how PA-AI integration can advance semantic communications, and integrated sensing and communication. This synergy paves the way for adaptive, resilient, and self-optimizing NG networks.

DCApr 9, 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

Feng Liang, Zhen Zhang, Haifeng Lu et al.

With the rapid growth in the volume of data sets, models, and devices in the domain of deep learning, there is increasing attention on large-scale distributed deep learning. In contrast to traditional distributed deep learning, the large-scale scenario poses new challenges that include fault tolerance, scalability of algorithms and infrastructures, and heterogeneity in data sets, models, and resources. Due to intensive synchronization of models and sharing of data across GPUs and computing nodes during distributed training and inference processes, communication efficiency becomes the bottleneck for achieving high performance at a large scale. This article surveys the literature over the period of 2018-2023 on algorithms and technologies aimed at achieving efficient communication in large-scale distributed deep learning at various levels, including algorithms, frameworks, and infrastructures. Specifically, we first introduce efficient algorithms for model synchronization and communication data compression in the context of large-scale distributed training. Next, we introduce efficient strategies related to resource allocation and task scheduling for use in distributed training and inference. After that, we present the latest technologies pertaining to modern communication infrastructures used in distributed deep learning with a focus on examining the impact of the communication overhead in a large-scale and heterogeneous setting. Finally, we conduct a case study on the distributed training of large language models at a large scale to illustrate how to apply these technologies in real cases. This article aims to offer researchers a comprehensive understanding of the current landscape of large-scale distributed deep learning and to reveal promising future research directions toward communication-efficient solutions in this scope.

LGApr 30
DeRelayL: Sustainable Decentralized Relay Learning

Haihan Duan, Tengfei Ma, Yuyang Qin et al.

In the era of big data, large-scale machine learning models have revolutionized various fields, driving significant advancements. However, large-scale model training demands high financial and computational resources, which are only affordable by a few technological giants and well-funded institutions. In this case, common users like mobile users, the real creators of valuable data, are often excluded from fully benefiting due to the barriers, while the current methods for accessing large-scale models either limit user ownership or lack sustainability. This growing gap highlights the urgent need for a collaborative model training approach, allowing common users to train and share models. However, existing collaborative model training paradigms, especially federated learning (FL), primarily focus on data privacy and group-based model aggregation. To this end, this paper intends to address this issue by proposing a novel training paradigm named decentralized relay learning (DeRelayL), a sustainable learning system where permissionless participants can contribute to model training in a relay-like manner and share the model. In detail, this paper presents the architecture and workflow of DeRelayL, designs incentive mechanisms to ensure sustainability, and conducts theoretical analysis and numerical simulations to demonstrate its effectiveness.

ITApr 26
Distributed Electromagnetic Neural Networks for Task-Oriented Semantic Communications

Jinbao Li, Jiancheng An, Hao Liu et al.

Semantic communications (SemCom) is a promising paradigm that prioritizes the transmission of task-relevant information, thereby enabling superior communication efficiency over traditional bit-centric systems. However, most existing SemCom systems face critical limitations in computational efficiency and spatial flexibility. To overcome these limitations, we propose a novel unmanned aerial vehicles (UAV)-enabled distributed electromagnetic neural network (EMNN) for a task-oriented SemCom system. Specifically, the proposed distributed EMNN is composed of multiple UAV-mounted stacked intelligent metasurfaces (SIM) and a ground receiving station (GRS), where multiple SIMs collaboratively encode image semantics in the wave domain, and the GRS performs decoding based on the received power distribution. Moreover, we employ a temperature-adaptive gradient optimization algorithm to train the distributed EMNN, which mitigates gradient vanishing and enhances learning stability. Finally, the numerical simulation results demonstrate the effectiveness of distributed EMNN in image recognition task-oriented SemCom, achieving an average $8\%$ accuracy improvement over the single-SIM baseline across multiple datasets.

NIApr 20, 2024
Socialized Learning: A Survey of the Paradigm Shift for Edge Intelligence in Networked Systems

Xiaofei Wang, Yunfeng Zhao, Chao Qiu et al.

Amidst the robust impetus from artificial intelligence (AI) and big data, edge intelligence (EI) has emerged as a nascent computing paradigm, synthesizing AI with edge computing (EC) to become an exemplary solution for unleashing the full potential of AI services. Nonetheless, challenges in communication costs, resource allocation, privacy, and security continue to constrain its proficiency in supporting services with diverse requirements. In response to these issues, this paper introduces socialized learning (SL) as a promising solution, further propelling the advancement of EI. SL is a learning paradigm predicated on social principles and behaviors, aimed at amplifying the collaborative capacity and collective intelligence of agents within the EI system. SL not only enhances the system's adaptability but also optimizes communication, and networking processes, essential for distributed intelligence across diverse devices and platforms. Therefore, a combination of SL and EI may greatly facilitate the development of collaborative intelligence in the future network. This paper presents the findings of a literature review on the integration of EI and SL, summarizing the latest achievements in existing research on EI and SL. Subsequently, we delve comprehensively into the limitations of EI and how it could benefit from SL. Special emphasis is placed on the communication challenges and networking strategies and other aspects within these systems, underlining the role of optimized network solutions in improving system efficiency. Based on these discussions, we elaborate in detail on three integrated components: socialized architecture, socialized training, and socialized inference, analyzing their strengths and weaknesses. Finally, we identify some possible future applications of combining SL and EI, discuss open problems and suggest some future research.

DCMay 3, 2025
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey

Jing Liu, Yao Du, Kun Yang et al.

Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications, integrating cloud resources with edge devices to enable efficient, low-latency processing. Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems, yet introduce significant challenges in model deployment and resource management. In this survey, we comprehensive examine the intersection of distributed intelligence and model optimization within edge-cloud environments, providing a structured tutorial on fundamental architectures, enabling technologies, and emerging applications. Additionally, we systematically analyze model optimization approaches, including compression, adaptation, and neural architecture search, alongside AI-driven resource management strategies that balance performance, energy efficiency, and latency requirements. We further explore critical aspects of privacy protection and security enhancement within ECCC systems and examines practical deployments through diverse applications, spanning autonomous driving, healthcare, and industrial automation. Performance analysis and benchmarking techniques are also thoroughly explored to establish evaluation standards for these complex systems. Furthermore, the review identifies critical research directions including LLMs deployment, 6G integration, neuromorphic computing, and quantum computing, offering a roadmap for addressing persistent challenges in heterogeneity management, real-time processing, and scalability. By bridging theoretical advancements and practical deployments, this survey offers researchers and practitioners a holistic perspective on leveraging AI to optimize distributed computing environments, fostering innovation in next-generation intelligent systems.

AIDec 23, 2024
Facial Expression Analysis and Its Potentials in IoT Systems: A Contemporary Survey

Zixuan Shangguan, Yanjie Dong, Song Guo et al.

Facial expressions convey human emotions and can be categorized into macro-expressions (MaEs) and micro-expressions (MiEs) based on duration and intensity. While MaEs are voluntary and easily recognized, MiEs are involuntary, rapid, and can reveal concealed emotions. The integration of facial expression analysis with Internet-of-Thing (IoT) systems has significant potential across diverse scenarios. IoT-enhanced MaE analysis enables real-time monitoring of patient emotions, facilitating improved mental health care in smart healthcare. Similarly, IoT-based MiE detection enhances surveillance accuracy and threat detection in smart security. Our work aims to provide a comprehensive overview of research progress in facial expression analysis and explores its potential integration with IoT systems. We discuss the distinctions between our work and existing surveys, elaborate on advancements in MaE and MiE analysis techniques across various learning paradigms, and examine their potential applications in IoT. We highlight challenges and future directions for the convergence of facial expression-based technologies and IoT systems, aiming to foster innovation in this domain. By presenting recent developments and practical applications, our work offers a systematic understanding of the ways of facial expression analysis to enhance IoT systems in healthcare, security, and beyond.

AIFeb 13, 2025
AoI-Sensitive Data Forwarding with Distributed Beamforming in UAV-Assisted IoT

Zifan Lang, Guixia Liu, Geng Sun et al.

This paper proposes a UAV-assisted forwarding system based on distributed beamforming to enhance age of information (AoI) in Internet of Things (IoT). Specifically, UAVs collect and relay data between sensor nodes (SNs) and the remote base station (BS). However, flight delays increase the AoI and degrade the network performance. To mitigate this, we adopt distributed beamforming to extend the communication range, reduce the flight frequency and ensure the continuous data relay and efficient energy utilization. Then, we formulate an optimization problem to minimize AoI and UAV energy consumption, by jointly optimizing the UAV trajectories and communication schedules. The problem is non-convex and with high dynamic, and thus we propose a deep reinforcement learning (DRL)-based algorithm to solve the problem, thereby enhancing the stability and accelerate convergence speed. Simulation results show that the proposed algorithm effectively addresses the problem and outperforms other benchmark algorithms.

NIDec 13, 2023
On Designing Multi-UAV aided Wireless Powered Dynamic Communication via Hierarchical Deep Reinforcement Learning

Ze Yu Zhao, Yue Ling Che, Sheng Luo et al.

This paper proposes a novel design on the wireless powered communication network (WPCN) in dynamic environments under the assistance of multiple unmanned aerial vehicles (UAVs). Unlike the existing studies, where the low-power wireless nodes (WNs) often conform to the coherent harvest-then-transmit protocol, under our newly proposed double-threshold based WN type updating rule, each WN can dynamically and repeatedly update its WN type as an E-node for non-linear energy harvesting over time slots or an I-node for transmitting data over sub-slots. To maximize the total transmission data size of all the WNs over T slots, each of the UAVs individually determines its trajectory and binary wireless energy transmission (WET) decisions over times slots and its binary wireless data collection (WDC) decisions over sub-slots, under the constraints of each UAV's limited on-board energy and each WN's node type updating rule. However, due to the UAVs' tightly-coupled trajectories with their WET and WDC decisions, as well as each WN's time-varying battery energy, this problem is difficult to solve optimally. We then propose a new multi-agent based hierarchical deep reinforcement learning (MAHDRL) framework with two tiers to solve the problem efficiently, where the soft actor critic (SAC) policy is designed in tier-1 to determine each UAV's continuous trajectory and binary WET decision over time slots, and the deep-Q learning (DQN) policy is designed in tier-2 to determine each UAV's binary WDC decisions over sub-slots under the given UAV trajectory from tier-1. Both of the SAC policy and the DQN policy are executed distributively at each UAV. Finally, extensive simulation results are provided to validate the outweighed performance of the proposed MAHDRL approach over various state-of-the-art benchmarks.

LGMay 29, 2025
Sentinel: Scheduling Live Streams with Proactive Anomaly Detection in Crowdsourced Cloud-Edge Platforms

Yuting Li, Shaoyuan Huang, Tengwen Zhang et al.

With the rapid growth of live streaming services, Crowdsourced Cloud-edge service Platforms (CCPs) are playing an increasingly important role in meeting the increasing demand. Although stream scheduling plays a critical role in optimizing CCPs' revenue, most optimization strategies struggle to achieve practical results due to various anomalies in unstable CCPs. Additionally, the substantial scale of CCPs magnifies the difficulties of anomaly detection in time-sensitive scheduling. To tackle these challenges, this paper proposes Sentinel, a proactive anomaly detection-based scheduling framework. Sentinel models the scheduling process as a two-stage Pre-Post-Scheduling paradigm: in the pre-scheduling stage, Sentinel conducts anomaly detection and constructs a strategy pool; in the post-scheduling stage, upon request arrival, it triggers an appropriate scheduling based on a pre-generated strategy to implement the scheduling process. Extensive experiments on realistic datasets show that Sentinel significantly reduces anomaly frequency by 70%, improves revenue by 74%, and doubles the scheduling speed.

CVFeb 26, 2025
CLIP-Optimized Multimodal Image Enhancement via ISP-CNN Fusion for Coal Mine IoVT under Uneven Illumination

Shuai Wang, Shihao Zhang, Jiaqi Wu et al.

Clear monitoring images are crucial for the safe operation of coal mine Internet of Video Things (IoVT) systems. However, low illumination and uneven brightness in underground environments significantly degrade image quality, posing challenges for enhancement methods that often rely on difficult-to-obtain paired reference images. Additionally, there is a trade-off between enhancement performance and computational efficiency on edge devices within IoVT systems.To address these issues, we propose a multimodal image enhancement method tailored for coal mine IoVT, utilizing an ISP-CNN fusion architecture optimized for uneven illumination. This two-stage strategy combines global enhancement with detail optimization, effectively improving image quality, especially in poorly lit areas. A CLIP-based multimodal iterative optimization allows for unsupervised training of the enhancement algorithm. By integrating traditional image signal processing (ISP) with convolutional neural networks (CNN), our approach reduces computational complexity while maintaining high performance, making it suitable for real-time deployment on edge devices.Experimental results demonstrate that our method effectively mitigates uneven brightness and enhances key image quality metrics, with PSNR improvements of 2.9%-4.9%, SSIM by 4.3%-11.4%, and VIF by 4.9%-17.8% compared to seven state-of-the-art algorithms. Simulated coal mine monitoring scenarios validate our method's ability to balance performance and computational demands, facilitating real-time enhancement and supporting safer mining operations.

LGOct 23, 2025
CO-PFL: Contribution-Oriented Personalized Federated Learning for Heterogeneous Networks

Ke Xing, Yanjie Dong, Xiaoyi Fan et al.

Personalized federated learning (PFL) addresses a critical challenge of collaboratively training customized models for clients with heterogeneous and scarce local data. Conventional federated learning, which relies on a single consensus model, proves inadequate under such data heterogeneity. Its standard aggregation method of weighting client updates heuristically or by data volume, operates under an equal-contribution assumption, failing to account for the actual utility and reliability of each client's update. This often results in suboptimal personalization and aggregation bias. To overcome these limitations, we introduce Contribution-Oriented PFL (CO-PFL), a novel algorithm that dynamically estimates each client's contribution for global aggregation. CO-PFL performs a joint assessment by analyzing both gradient direction discrepancies and prediction deviations, leveraging information from gradient and data subspaces. This dual-subspace analysis provides a principled and discriminative aggregation weight for each client, emphasizing high-quality updates. Furthermore, to bolster personalization adaptability and optimization stability, CO-PFL cohesively integrates a parameter-wise personalization mechanism with mask-aware momentum optimization. Our approach effectively mitigates aggregation bias, strengthens global coordination, and enhances local performance by facilitating the construction of tailored submodels with stable updates. Extensive experiments on four benchmark datasets (CIFAR10, CIFAR10C, CINIC10, and Mini-ImageNet) confirm that CO-PFL consistently surpasses state-of-the-art methods in in personalization accuracy, robustness, scalability and convergence stability.

SPSep 16, 2025
Joint Channel Estimation and Computation Offloading in Fluid Antenna-assisted MEC Networks

Ying Ju, Mingdong Li, Haoyu Wang et al.

With the emergence of fluid antenna (FA) in wireless communications, the capability to dynamically adjust port positions offers substantial benefits in spatial diversity and spectrum efficiency, which are particularly valuable for mobile edge computing (MEC) systems. Therefore, we propose an FA-assisted MEC offloading framework to minimize system delay. This framework faces two severe challenges, which are the complexity of channel estimation due to dynamic port configuration and the inherent non-convexity of the joint optimization problem. Firstly, we propose Information Bottleneck Metric-enhanced Channel Compressed Sensing (IBM-CCS), which advances FA channel estimation by integrating information relevance into the sensing process and capturing key features of FA channels effectively. Secondly, to address the non-convex and high-dimensional optimization problem in FA-assisted MEC systems, which includes FA port selection, beamforming, power control, and resource allocation, we propose a game theory-assisted Hierarchical Twin-Dueling Multi-agent Algorithm (HiTDMA) based offloading scheme, where the hierarchical structure effectively decouples and coordinates the optimization tasks between the user side and the base station side. Crucially, the game theory effectively reduces the dimensionality of power control variables, allowing deep reinforcement learning (DRL) agents to achieve improved optimization efficiency. Numerical results confirm that the proposed scheme significantly reduces system delay and enhances offloading performance, outperforming benchmarks. Additionally, the IBM-CCS channel estimation demonstrates superior accuracy and robustness under varying port densities, contributing to efficient communication under imperfect CSI.

CLAug 31, 2025
Exploring and Mitigating Fawning Hallucinations in Large Language Models

Zixuan Shangguan, Yanjie Dong, Lanjun Wang et al.

Large language models (LLMs) have demonstrated exceptional proficiency in language understanding. However, when LLMs align their outputs with deceptive and/or misleading prompts, the generated responses could deviate from the de facto information. Such observations are known as fawning hallucinations, where the model prioritizes alignment with the input's implied perspective over accuracy and truthfulness. In this work, we analyze fawning hallucinations in various natural language processing tasks and tailor the so-termed contrastive decoding method for fawning-hallucination mitigation. Specifically, we design two paradigms to generate corresponding deceptive and/or misleading inputs for the consistent fawning hallucinations induction. Then, we propose the collaborative contrastive decoding (CCD) to handle the fawning hallucinations across different tasks in LLMs. By contrasting the deviation in output distribution between induced and transformed neutral inputs, the proposed CCD can reduce reliance on deceptive and/or misleading information without requiring additional training. Extensive experiments demonstrate that the proposed CCD can effectively mitigate fawning hallucinations and improve the factuality of the generated responses over various tasks.

NIAug 26, 2025
A Survey on Cloud-Edge-Terminal Collaborative Intelligence in AIoT Networks

Jiaqi Wu, Jing Liu, Yang Liu et al.

The proliferation of Internet of things (IoT) devices in smart cities, transportation, healthcare, and industrial applications, coupled with the explosive growth of AI-driven services, has increased demands for efficient distributed computing architectures and networks, driving cloud-edge-terminal collaborative intelligence (CETCI) as a fundamental paradigm within the artificial intelligence of things (AIoT) community. With advancements in deep learning, large language models (LLMs), and edge computing, CETCI has made significant progress with emerging AIoT applications, moving beyond isolated layer optimization to deployable collaborative intelligence systems for AIoT (CISAIOT), a practical research focus in AI, distributed computing, and communications. This survey describes foundational architectures, enabling technologies, and scenarios of CETCI paradigms, offering a tutorial-style review for CISAIOT beginners. We systematically analyze architectural components spanning cloud, edge, and terminal layers, examining core technologies including network virtualization, container orchestration, and software-defined networking, while presenting categorizations of collaboration paradigms that cover task offloading, resource allocation, and optimization across heterogeneous infrastructures. Furthermore, we explain intelligent collaboration learning frameworks by reviewing advances in federated learning, distributed deep learning, edge-cloud model evolution, and reinforcement learning-based methods. Finally, we discuss challenges (e.g., scalability, heterogeneity, interoperability) and future trends (e.g., 6G+, agents, quantum computing, digital twin), highlighting how integration of distributed computing and communication can address open issues and guide development of robust, efficient, and secure collaborative AIoT systems.

CVMar 24, 2025
CRCL: Causal Representation Consistency Learning for Anomaly Detection in Surveillance Videos

Yang Liu, Hongjin Wang, Zepu Wang et al.

Video Anomaly Detection (VAD) remains a fundamental yet formidable task in the video understanding community, with promising applications in areas such as information forensics and public safety protection. Due to the rarity and diversity of anomalies, existing methods only use easily collected regular events to model the inherent normality of normal spatial-temporal patterns in an unsupervised manner. Previous studies have shown that existing unsupervised VAD models are incapable of label-independent data offsets (e.g., scene changes) in real-world scenarios and may fail to respond to light anomalies due to the overgeneralization of deep neural networks. Inspired by causality learning, we argue that there exist causal factors that can adequately generalize the prototypical patterns of regular events and present significant deviations when anomalous instances occur. In this regard, we propose Causal Representation Consistency Learning (CRCL) to implicitly mine potential scene-robust causal variable in unsupervised video normality learning. Specifically, building on the structural causal models, we propose scene-debiasing learning and causality-inspired normality learning to strip away entangled scene bias in deep representations and learn causal video normality, respectively. Extensive experiments on benchmarks validate the superiority of our method over conventional deep representation learning. Moreover, ablation studies and extension validation show that the CRCL can cope with label-independent biases in multi-scene settings and maintain stable performance with only limited training data available.

ASJan 6, 2025
Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition

Ruoyu Zhao, Xiantao Jiang, F. Richard Yu et al.

Speech Emotion Recognition (SER) plays a crucial role in enhancing human-computer interaction. Cross-Linguistic SER (CLSER) has been a challenging research problem due to significant variability in linguistic and acoustic features of different languages. In this study, we propose a novel approach HuMP-CAT, which combines HuBERT, MFCC, and prosodic characteristics. These features are fused using a cross-attention transformer (CAT) mechanism during feature extraction. Transfer learning is applied to gain from a source emotional speech dataset to the target corpus for emotion recognition. We use IEMOCAP as the source dataset to train the source model and evaluate the proposed method on seven datasets in five languages (e.g., English, German, Spanish, Italian, and Chinese). We show that, by fine-tuning the source model with a small portion of speech from the target datasets, HuMP-CAT achieves an average accuracy of 78.75% across the seven datasets, with notable performance of 88.69% on EMODB (German language) and 79.48% on EMOVO (Italian language). Our extensive evaluation demonstrates that HuMP-CAT outperforms existing methods across multiple target languages.

CVDec 24, 2024
Efficient Detection Framework Adaptation for Edge Computing: A Plug-and-play Neural Network Toolbox Enabling Edge Deployment

Jiaqi Wu, Shihao Zhang, Simin Chen et al.

Edge computing has emerged as a key paradigm for deploying deep learning-based object detection in time-sensitive scenarios. However, existing edge detection methods face challenges: 1) difficulty balancing detection precision with lightweight models, 2) limited adaptability of generalized deployment designs, and 3) insufficient real-world validation. To address these issues, we propose the Edge Detection Toolbox (ED-TOOLBOX), which utilizes generalizable plug-and-play components to adapt object detection models for edge environments. Specifically, we introduce a lightweight Reparameterized Dynamic Convolutional Network (Rep-DConvNet) featuring weighted multi-shape convolutional branches to enhance detection performance. Additionally, we design a Sparse Cross-Attention (SC-A) network with a localized-mapping-assisted self-attention mechanism, enabling a well-crafted joint module for adaptive feature transfer. For real-world applications, we incorporate an Efficient Head into the YOLO framework to accelerate edge model optimization. To demonstrate practical impact, we identify a gap in helmet detection -- overlooking band fastening, a critical safety factor -- and create the Helmet Band Detection Dataset (HBDD). Using ED-TOOLBOX-optimized models, we address this real-world task. Extensive experiments validate the effectiveness of ED-TOOLBOX, with edge detection models outperforming six state-of-the-art methods in visual surveillance simulations, achieving real-time and accurate performance. These results highlight ED-TOOLBOX as a superior solution for edge object detection.

CVOct 29, 2024
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment

Jiaqi Wu, Simin Chen, Zehua Wang et al.

As the volume of image data grows, data-oriented cloud computing in Internet of Video Things (IoVT) systems encounters latency issues. Task-oriented edge computing addresses this by shifting data analysis to the edge. However, limited computational power of edge devices poses challenges for executing visual tasks. Existing methods struggle to balance high model performance with low resource consumption; lightweight neural networks often underperform, while device-specific models designed by Neural Architecture Search (NAS) fail to adapt to heterogeneous devices. For these issues, we propose a novel co-design framework to optimize neural network architecture and deployment strategies during inference for high-throughput. Specifically, it implements a dynamic model structure based on re-parameterization, coupled with a Roofline-based model partitioning strategy to enhance the computational performance of edge devices. We also employ a multi-objective co-optimization approach to balance throughput and accuracy. Additionally, we derive mathematical consistency and convergence of partitioned models. Experimental results demonstrate significant improvements in throughput (12.05\% on MNIST, 18.83\% on ImageNet) and superior classification accuracy compared to baseline algorithms. Our method consistently achieves stable performance across different devices, underscoring its adaptability. Simulated experiments further confirm its efficacy in high-accuracy, real-time detection for small objects in IoVT systems.

DCJun 12, 2024
Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey

Feng Liang, Zhen Zhang, Haifeng Lu et al.

With rapidly increasing distributed deep learning workloads in large-scale data centers, efficient distributed deep learning framework strategies for resource allocation and workload scheduling have become the key to high-performance deep learning. The large-scale environment with large volumes of datasets, models, and computational and communication resources raises various unique challenges for resource allocation and workload scheduling in distributed deep learning, such as scheduling complexity, resource and workload heterogeneity, and fault tolerance. To uncover these challenges and corresponding solutions, this survey reviews the literature, mainly from 2019 to 2024, on efficient resource allocation and workload scheduling strategies for large-scale distributed DL. We explore these strategies by focusing on various resource types, scheduling granularity levels, and performance goals during distributed training and inference processes. We highlight critical challenges for each topic and discuss key insights of existing technologies. To illustrate practical large-scale resource allocation and workload scheduling in real distributed deep learning scenarios, we use a case study of training large language models. This survey aims to encourage computer science, artificial intelligence, and communications researchers to understand recent advances and explore future research directions for efficient framework strategies for large-scale distributed deep learning.

NIJun 17, 2021
Cooperative Multi-Agent Reinforcement Learning Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks

Xiang Tan, Li Zhou, Haijun Wang et al.

With the development of the 5G and Internet of Things, amounts of wireless devices need to share the limited spectrum resources. Dynamic spectrum access (DSA) is a promising paradigm to remedy the problem of inefficient spectrum utilization brought upon by the historical command-and-control approach to spectrum allocation. In this paper, we investigate the distributed DSA problem for multi-user in a typical multi-channel cognitive radio network. The problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP), and we proposed a centralized off-line training and distributed on-line execution framework based on cooperative multi-agent reinforcement learning (MARL). We employ the deep recurrent Q-network (DRQN) to address the partial observability of the state for each cognitive user. The ultimate goal is to learn a cooperative strategy which maximizes the sum throughput of cognitive radio network in distributed fashion without coordination information exchange between cognitive users. Finally, we validate the proposed algorithm in various settings through extensive experiments. From the simulation results, we can observe that the proposed algorithm can converge fast and achieve almost the optimal performance.

DCJan 17, 2021
Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud System

Yiwen Han, Shihao Shen, Xiaofei Wang et al.

Kubernetes (k8s) has the potential to merge the distributed edge and the cloud but lacks a scheduling framework specifically for edge-cloud systems. Besides, the hierarchical distribution of heterogeneous resources and the complex dependencies among requests and resources make the modeling and scheduling of k8s-oriented edge-cloud systems particularly sophisticated. In this paper, we introduce KaiS, a learning-based scheduling framework for such edge-cloud systems to improve the long-term throughput rate of request processing. First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch and dynamic dispatch spaces within the edge cluster. Second, for diverse system scales and structures, we use graph neural networks to embed system state information, and combine the embedding results with multiple policy networks to reduce the orchestration dimensionality by stepwise scheduling. Finally, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration, and present the implementation design of deploying the above algorithms compatible with native k8s components. Experiments using real workload traces show that KaiS can successfully learn appropriate scheduling policies, irrespective of request arrival patterns and system scales. Moreover, KaiS can enhance the average system throughput rate by 14.3% while reducing scheduling cost by 34.7% compared to baselines.

NIAug 17, 2020
Edge Network-Assisted Real-Time Object Detection Framework for Autonomous Driving

Seung Wook Kim, Keunsoo Ko, Haneul Ko et al.

Autonomous vehicles (AVs) can achieve the desired results within a short duration by offloading tasks even requiring high computational power (e.g., object detection (OD)) to edge clouds. However, although edge clouds are exploited, real-time OD cannot always be guaranteed due to dynamic channel quality. To mitigate this problem, we propose an edge network-assisted real-time OD framework~(EODF). In an EODF, AVs extract the region of interests~(RoIs) of the captured image when the channel quality is not sufficiently good for supporting real-time OD. Then, AVs compress the image data on the basis of the RoIs and transmit the compressed one to the edge cloud. In so doing, real-time OD can be achieved owing to the reduced transmission latency. To verify the feasibility of our framework, we evaluate the probability that the results of OD are not received within the inter-frame duration (i.e., outage probability) and their accuracy. From the evaluation, we demonstrate that the proposed EODF provides the results to AVs in real-time and achieves satisfactory accuracy.

LGJun 17, 2020
Communication-Efficient Robust Federated Learning Over Heterogeneous Datasets

Yanjie Dong, Georgios B. Giannakis, Tianyi Chen et al.

This work investigates fault-resilient federated learning when the data samples are non-uniformly distributed across workers, and the number of faulty workers is unknown to the central server. In the presence of adversarially faulty workers who may strategically corrupt datasets, the local messages exchanged (e.g., local gradients and/or local model parameters) can be unreliable, and thus the vanilla stochastic gradient descent (SGD) algorithm is not guaranteed to converge. Recently developed algorithms improve upon vanilla SGD by providing robustness to faulty workers at the price of slowing down convergence. To remedy this limitation, the present work introduces a fault-resilient proximal gradient (FRPG) algorithm that relies on Nesterov's acceleration technique. To reduce the communication overhead of FRPG, a local (L) FRPG algorithm is also developed to allow for intermittent server-workers parameter exchanges. For strongly convex loss functions, FRPG and LFRPG have provably faster convergence rates than a benchmark robust stochastic aggregation algorithm. Moreover, LFRPG converges faster than FRPG while using the same communication rounds. Numerical tests performed on various real datasets confirm the accelerated convergence of FRPG and LFRPG over the robust stochastic aggregation benchmark and competing alternatives.

HCMar 13, 2020
Emotion Recognition From Gait Analyses: Current Research and Future Directions

Shihao Xu, Jing Fang, Xiping Hu et al.

Human gait refers to a daily motion that represents not only mobility, but it can also be used to identify the walker by either human observers or computers. Recent studies reveal that gait even conveys information about the walker's emotion. Individuals in different emotion states may show different gait patterns. The mapping between various emotions and gait patterns provides a new source for automated emotion recognition. Compared to traditional emotion detection biometrics, such as facial expression, speech and physiological parameters, gait is remotely observable, more difficult to imitate, and requires less cooperation from the subject. These advantages make gait a promising source for emotion detection. This article reviews current research on gait-based emotion detection, particularly on how gait parameters can be affected by different emotion states and how the emotion states can be recognized through distinct gait patterns. We focus on the detailed methods and techniques applied in the whole process of emotion recognition: data collection, preprocessing, and classification. At last, we discuss possible future developments of efficient and effective gait-based emotion recognition using the state of the art techniques on intelligent computation and big data.

NIJul 19, 2019
Convergence of Edge Computing and Deep Learning: A Comprehensive Survey

Xiaofei Wang, Yiwen Han, Victor C. M. Leung et al.

Ubiquitous sensors and smart devices from factories and communities are generating massive amounts of data, and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network. As an important enabler broadly changing people's lives, from face recognition to ambitious smart factories and cities, developments of artificial intelligence (especially deep learning, DL) based applications and services are thriving. However, due to efficiency and latency issues, the current cloud computing service architecture hinders the vision of "providing artificial intelligence for every person and every organization at everywhere". Thus, unleashing DL services using resources at the network edge near the data sources has emerged as a desirable solution. Therefore, edge intelligence, aiming to facilitate the deployment of DL services by edge computing, has received significant attention. In addition, DL, as the representative technique of artificial intelligence, can be integrated into edge computing frameworks to build intelligent edge for dynamic, adaptive edge maintenance and management. With regard to mutually beneficial edge intelligence and intelligent edge, this paper introduces and discusses: 1) the application scenarios of both; 2) the practical implementation methods and enabling technologies, namely DL training and inference in the customized edge computing framework; 3) challenges and future trends of more pervasive and fine-grained intelligence. We believe that by consolidating information scattered across the communication, networking, and DL areas, this survey can help readers to understand the connections between enabling technologies while promoting further discussions on the fusion of edge intelligence and intelligent edge, i.e., Edge DL.

LGJun 3, 2019
Secure Distributed On-Device Learning Networks With Byzantine Adversaries

Yanjie Dong, Julian Cheng, Md. Jahangir Hossain et al.

The privacy concern exists when the central server has the copies of datasets. Hence, there is a paradigm shift for the learning networks to change from centralized in-cloud learning to distributed \mbox{on-device} learning. Benefit from the parallel computing, the on-device learning networks have a lower bandwidth requirement than the in-cloud learning networks. Moreover, the on-device learning networks also have several desirable characteristics such as privacy preserving and flexibility. However, the \mbox{on-device} learning networks are vulnerable to the malfunctioning terminals across the networks. The worst-case malfunctioning terminals are the Byzantine adversaries, that can perform arbitrary harmful operations to compromise the learned model based on the full knowledge of the networks. Hence, the design of secure learning algorithms becomes an emerging topic in the on-device learning networks with Byzantine adversaries. In this article, we present a comprehensive overview of the prevalent secure learning algorithms for the two promising on-device learning networks: Federated-Learning networks and decentralized-learning networks. We also review several future research directions in the \mbox{Federated-Learning} and decentralized-learning networks.

DCOct 12, 2018
Decentralized Applications: The Blockchain-Empowered Software System

Wei Cai, Zehua Wang, Jason B. Ernst et al.

Blockchain technology has attracted tremendous attention in both academia and capital market. However, overwhelming speculations on thousands of available cryptocurrencies and numerous initial coin offering (ICO) scams have also brought notorious debates on this emerging technology. This paper traces the development of blockchain systems to reveal the importance of decentralized applications (dApps) and the future value of blockchain. We survey the state-of-the-art dApps and discuss the direction of blockchain development to fulfill the desirable characteristics of dApps. The readers will gain an overview of dApp research and get familiar with recent developments in the blockchain.