AINov 21, 2022
Intelligent Computing: The Latest Advances, Challenges and FutureShiqiang Zhu, Ting Yu, Tao Xu et al.
Computing is a critical driving force in the development of human civilization. In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence and internet-of-things with new computing theories, architectures, methods, systems, and applications. Intelligent computing has greatly broadened the scope of computing, extending it from traditional computing on data to increasingly diverse computing paradigms such as perceptual intelligence, cognitive intelligence, autonomous intelligence, and human-computer fusion intelligence. Intelligence and computing have undergone paths of different evolution and development for a long time but have become increasingly intertwined in recent years: intelligent computing is not only intelligence-oriented but also intelligence-driven. Such cross-fertilization has prompted the emergence and rapid advancement of intelligent computing. Intelligent computing is still in its infancy and an abundance of innovations in the theories, systems, and applications of intelligent computing are expected to occur soon. We present the first comprehensive survey of literature on intelligent computing, covering its theory fundamentals, the technological fusion of intelligence and computing, important applications, challenges, and future perspectives. We believe that this survey is highly timely and will provide a comprehensive reference and cast valuable insights into intelligent computing for academic and industrial researchers and practitioners.
CVJul 1, 2024Code
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding EvaluationYuxuan Wang, Yijun Liu, Fei Yu et al.
Despite the rapid development of Chinese vision-language models (VLMs), most existing Chinese vision-language (VL) datasets are constructed on Western-centric images from existing English VL datasets. The cultural bias in the images makes these datasets unsuitable for evaluating VLMs in Chinese culture. To remedy this issue, we present a new Chinese Vision- Language Understanding Evaluation (CVLUE) benchmark dataset, where the selection of object categories and images is entirely driven by Chinese native speakers, ensuring that the source images are representative of Chinese culture. The benchmark contains four distinct VL tasks ranging from image-text retrieval to visual question answering, visual grounding and visual dialogue. We present a detailed statistical analysis of CVLUE and provide a baseline performance analysis with several open-source multilingual VLMs on CVLUE and its English counterparts to reveal their performance gap between English and Chinese. Our in-depth category-level analysis reveals a lack of Chinese cultural knowledge in existing VLMs. We also find that fine-tuning on Chinese culture-related VL datasets effectively enhances VLMs' understanding of Chinese culture.
LGAug 24, 2023
A Huber Loss Minimization Approach to Byzantine Robust Federated LearningPuning Zhao, Fei Yu, Zhiguo Wan
Federated learning systems are susceptible to adversarial attacks. To combat this, we introduce a novel aggregator based on Huber loss minimization, and provide a comprehensive theoretical analysis. Under independent and identically distributed (i.i.d) assumption, our approach has several advantages compared to existing methods. Firstly, it has optimal dependence on $ε$, which stands for the ratio of attacked clients. Secondly, our approach does not need precise knowledge of $ε$. Thirdly, it allows different clients to have unequal data sizes. We then broaden our analysis to include non-i.i.d data, such that clients have slightly different distributions.
CVSep 18, 2021Code
Memory Regulation and Alignment toward Generalizer RGB-Infrared PersonFeng Chen, Fei Wu, Qi Wu et al.
The domain shift, coming from unneglectable modality gap and non-overlapped identity classes between training and test sets, is a major issue of RGB-Infrared person re-identification. A key to tackle the inherent issue -- domain shift -- is to enforce the data distributions of the two domains to be similar. However, RGB-IR ReID always demands discriminative features, leading to over-rely feature sensitivity of seen classes, \textit{e.g.}, via attention-based feature alignment or metric learning. Therefore, predicting the unseen query category from predefined training classes may not be accurate and leads to a sub-optimal adversarial gradient. In this paper, we uncover it in a more explainable way and propose a novel multi-granularity memory regulation and alignment module (MG-MRA) to solve this issue. By explicitly incorporating a latent variable attribute, from fine-grained to coarse semantic granularity, into intermediate features, our method could alleviate the over-confidence of the model about discriminative features of seen classes. Moreover, instead of matching discriminative features by traversing nearest neighbor, sparse attributes, \textit{i.e.}, global structural pattern, are recollected with respect to features and assigned to measure pair-wise image similarity in hashing. Extensive experiments on RegDB \cite{RegDB} and SYSU-MM01 \cite{SYSU} show the superiority of the proposed method that outperforms existing state-of-the-art methods. Our code is available in https://github.com/Chenfeng1271/MGMRA.
SPJan 24, 2025
Scene Understanding Enabled Semantic Communication with Open Channel CodingZhe Xiang, Fei Yu, Quan Deng et al.
As communication systems transition from symbol transmission to conveying meaningful information, sixth-generation (6G) networks emphasize semantic communication. This approach prioritizes high-level semantic information, improving robustness and reducing redundancy across modalities like text, speech, and images. However, traditional semantic communication faces limitations, including static coding strategies, poor generalization, and reliance on task-specific knowledge bases that hinder adaptability. To overcome these challenges, we propose a novel system combining scene understanding, Large Language Models (LLMs), and open channel coding, named \textbf{OpenSC}. Traditional systems rely on fixed domain-specific knowledge bases, limiting their ability to generalize. Our open channel coding approach leverages shared, publicly available knowledge, enabling flexible, adaptive encoding. This dynamic system reduces reliance on static task-specific data, enhancing adaptability across diverse tasks and environments. Additionally, we use scene graphs for structured semantic encoding, capturing object relationships and context to improve tasks like Visual Question Answering (VQA). Our approach selectively encodes key semantic elements, minimizing redundancy and improving transmission efficiency. Experimental results show significant improvements in both semantic understanding and efficiency, advancing the potential of adaptive, generalizable semantic communication in 6G networks.
SDDec 9, 2024
Pilot-guided Multimodal Semantic Communication for Audio-Visual Event LocalizationFei Yu, Zhe Xiang, Nan Che et al.
Multimodal semantic communication, which integrates various data modalities such as text, images, and audio, significantly enhances communication efficiency and reliability. It has broad application prospects in fields such as artificial intelligence, autonomous driving, and smart homes. However, current research primarily relies on analog channels and assumes constant channel states (perfect CSI), which is inadequate for addressing dynamic physical channels and noise in real-world scenarios. Existing methods often focus on single modality tasks and fail to handle multimodal stream data, such as video and audio, and their corresponding tasks. Furthermore, current semantic encoding and decoding modules mainly transmit single modality features, neglecting the need for multimodal semantic enhancement and recognition tasks. To address these challenges, this paper proposes a pilot-guided framework for multimodal semantic communication specifically tailored for audio-visual event localization tasks. This framework utilizes digital pilot codes and channel modules to guide the state of analog channels in real-wold scenarios and designs Euler-based multimodal semantic encoding and decoding that consider time-frequency characteristics based on dynamic channel state. This approach effectively handles multimodal stream source data, especially for audio-visual event localization tasks. Extensive numerical experiments demonstrate the robustness of the proposed framework in channel changes and its support for various communication scenarios. The experimental results show that the framework outperforms existing benchmark methods in terms of Signal-to-Noise Ratio (SNR), highlighting its advantage in semantic communication quality.
STMay 26, 2023
Robust Nonparametric Regression under Poisoning AttackPuning Zhao, Zhiguo Wan
This paper studies robust nonparametric regression, in which an adversarial attacker can modify the values of up to $q$ samples from a training dataset of size $N$. Our initial solution is an M-estimator based on Huber loss minimization. Compared with simple kernel regression, i.e. the Nadaraya-Watson estimator, this method can significantly weaken the impact of malicious samples on the regression performance. We provide the convergence rate as well as the corresponding minimax lower bound. The result shows that, with proper bandwidth selection, $\ell_\infty$ error is minimax optimal. The $\ell_2$ error is optimal with relatively small $q$, but is suboptimal with larger $q$. The reason is that this estimator is vulnerable if there are many attacked samples concentrating in a small region. To address this issue, we propose a correction method by projecting the initial estimate to the space of Lipschitz functions. The final estimate is nearly minimax optimal for arbitrary $q$, up to a $\ln N$ factor.
CRSep 29, 2021
When Blockchain Meets Smart Grids: A Comprehensive SurveyYihao Guo, Zhiguo Wan, Xiuzhen Cheng
Recent years have witnessed an increasing interest in the blockchain technology, and many blockchain-based applications have been developed to take advantage of its decentralization, transparency, fault tolerance, and strong security. In the field of smart grids, a plethora of proposals have emerged to utilize blockchain for augmenting intelligent energy management, energy trading, security and privacy protection, microgrid management, and energy vehicles. Compared with traditional centralized approaches, blockchain-based solutions are able to exploit the advantages of blockchain to realize better functionality in smart grids. However, the blockchain technology itself has its disadvantages in low processing throughput and weak privacy protection. Therefore, it is of paramount importance to study how to integrate blockchain with smart grids in a more effective way so that the advantages of blockchain can be maximized and its disadvantages can be avoided. This article surveys the state-of-the-art solutions aiming to integrate the emergent blockchain technology with smart grids. The goal of this survey is to discuss the necessity of applying blockchain in different components of smart grids, identify the challenges encountered by current solutions, and highlight the frameworks and techniques used to integrate blockchain with smart grids. We also present thorough comparison studies among blockchain-based solutions for smart grids from different perspectives, with the aim to provide insights on integrating blockchain with smart grids for different smart grid management tasks. Finally, we list the current projects and initiatives demonstrating the current effort from the practice side. Additionally, we draw attention to open problems that have not yet been tackled by existing solutions, and point out possible future research directions.
CRApr 27, 2021
Secure and Efficient Federated Learning Through Layering and Sharding BlockchainShuo Yuan, Bin Cao, Yao Sun et al.
Introducing blockchain into Federated Learning (FL) to build a trusted edge computing environment for transmission and learning has attracted widespread attention as a new decentralized learning pattern. However, traditional consensus mechanisms and architectures of blockchain systems face significant challenges in handling large-scale FL tasks, especially on Internet of Things (IoT) devices, due to their substantial resource consumption, limited transaction throughput, and complex communication requirements. To address these challenges, this paper proposes ChainFL, a novel two-layer blockchain-driven FL system. It splits the IoT network into multiple shards within the subchain layer, effectively reducing the scale of information exchange, and employs a Direct Acyclic Graph (DAG)-based mainchain as the mainchain layer, enabling parallel and asynchronous cross-shard validation. Furthermore, the FL procedure is customized to integrate deeply with blockchain technology, and a modified DAG consensus mechanism is designed to mitigate distortion caused by abnormal models. To provide a proof-of-concept implementation and evaluation, multiple subchains based on Hyperledger Fabric and a self-developed DAG-based mainchain are deployed. Extensive experiments demonstrate that ChainFL significantly surpasses conventional FL systems, showing up to a 14% improvement in training efficiency and a threefold increase in robustness.
CRJul 8, 2020
Open-Pub: A Transparent yet Privacy-Preserving Academic Publication System based on BlockchainYan Zhou, Zhiguo Wan, Zhangshuang Guan
Academic publications of latest research results are crucial to advance the development of all disciplines. However, there are several severe disadvantages in current academic publication systems. The first is the misconduct during the publication process due to the opaque paper review process. An anonymous reviewer may give biased comments to a paper without being noticed because the comments are seldom published for evaluation. Second, access to research papers is restricted to only subscribers, and even the authors cannot access their own papers. To address the above problems, we propose Open-Pub, a decentralized, transparent yet privacy-preserving academic publication scheme using the blockchain technology. In Open-Pub, we first design a threshold identity-based group signature (TIBGS) that protects identities of signers using verifiable secret sharing. Then we develop a strong double-blind mechanism to protect the identities of authors and reviewers. With this strong double-blind mechanism, authors can choose to submit papers anonymously, and validators distribute papers anonymously to reviewers on the blockchain according to their research interests. This process is publicly recorded and traceable on the blockchain so as to realize transparent peer preview. To evaluate its efficiency, we implement Open-Pub based on Ethereum and conduct comprehensive experiments to evaluate its performance, including computation cost and processing delay. The experiment results show that Open-Pub is highly efficient in computation and processing anonymous transactions.
CRJun 12, 2012
AnonyControl: Control Cloud Data Anonymously with Multi-Authority Attribute-Based EncryptionTaeho Jung, Xiang-Yang Li, Zhiguo Wan et al.
Cloud computing is a revolutionary computing paradigm which enables flexible, on-demand and low-cost usage of computing resources. However, those advantages, ironically, are the causes of security and privacy problems, which emerge because the data owned by different users are stored in some cloud servers instead of under their own control. To deal with security problems, various schemes based on the Attribute- Based Encryption (ABE) have been proposed recently. However, the privacy problem of cloud computing is yet to be solved. This paper presents an anonymous privilege control scheme AnonyControl to address the user and data privacy problem in a cloud. By using multiple authorities in cloud computing system, our proposed scheme achieves anonymous cloud data access, finegrained privilege control, and more importantly, tolerance to up to (N -2) authority compromise. Our security and performance analysis show that AnonyControl is both secure and efficient for cloud computing environment.