AIAug 17, 2023
Artificial Intelligence for Web 3.0: A Comprehensive SurveyMeng Shen, Zhehui Tan, Dusit Niyato et al.
Web 3.0 is the new generation of the Internet that is reconstructed with distributed technology, which focuses on data ownership and value expression. Also, it operates under the principle that data and digital assets should be owned and controlled by users rather than large corporations. In this survey, we explore the current development state of Web 3.0 and the application of AI Technology in Web 3.0. Through investigating the existing applications and components of Web 3.0, we propose an architectural framework for Web 3.0 from the perspective of ecological application scenarios. We outline and divide the ecology of Web 3.0 into four layers. The main functions of each layer are data management, value circulation, ecological governance, and application scenarios. Our investigation delves into the major challenges and issues present in each of these layers. In this context, AI has shown its strong potential to solve existing problems of Web 3.0. We illustrate the crucial role of AI in the foundation and growth of Web 3.0. We begin by providing an overview of AI, including machine learning algorithms and deep learning techniques. Then, we thoroughly analyze the current state of AI technology applications in the four layers of Web 3.0 and offer some insights into its potential future development direction.
MMJun 14, 2023
Towards Balanced Active Learning for Multimodal ClassificationMeng Shen, Yizheng Huang, Jianxiong Yin et al.
Training multimodal networks requires a vast amount of data due to their larger parameter space compared to unimodal networks. Active learning is a widely used technique for reducing data annotation costs by selecting only those samples that could contribute to improving model performance. However, current active learning strategies are mostly designed for unimodal tasks, and when applied to multimodal data, they often result in biased sample selection from the dominant modality. This unfairness hinders balanced multimodal learning, which is crucial for achieving optimal performance. To address this issue, we propose three guidelines for designing a more balanced multimodal active learning strategy. Following these guidelines, a novel approach is proposed to achieve more fair data selection by modulating the gradient embedding with the dominance degree among modalities. Our studies demonstrate that the proposed method achieves more balanced multimodal learning by avoiding greedy sample selection from the dominant modality. Our approach outperforms existing active learning strategies on a variety of multimodal classification tasks. Overall, our work highlights the importance of balancing sample selection in multimodal active learning and provides a practical solution for achieving more balanced active learning for multimodal classification.
CVDec 13, 2023Code
MLNet: Mutual Learning Network with Neighborhood Invariance for Universal Domain AdaptationYanzuo Lu, Meng Shen, Andy J Ma et al.
Universal domain adaptation (UniDA) is a practical but challenging problem, in which information about the relation between the source and the target domains is not given for knowledge transfer. Existing UniDA methods may suffer from the problems of overlooking intra-domain variations in the target domain and difficulty in separating between the similar known and unknown class. To address these issues, we propose a novel Mutual Learning Network (MLNet) with neighborhood invariance for UniDA. In our method, confidence-guided invariant feature learning with self-adaptive neighbor selection is designed to reduce the intra-domain variations for more generalizable feature representation. By using the cross-domain mixup scheme for better unknown-class identification, the proposed method compensates for the misidentified known-class errors by mutual learning between the closed-set and open-set classifiers. Extensive experiments on three publicly available benchmarks demonstrate that our method achieves the best results compared to the state-of-the-arts in most cases and significantly outperforms the baseline across all the four settings in UniDA. Code is available at https://github.com/YanzuoLu/MLNet.
40.0CVMay 20
Reducing Object Hallucination in LVLMs via Emphasizing Image-negative TokensMeng Shen, Minghao Wu, Deepu Rajan
Object hallucination is a significant challenge that hinders the application of large vision-language models (LVLMs) in practice. We hypothesize that one possible origin of hallucination is the model's tendency to prioritize text generation over meaningful interaction with images. To explore this, we examine the generation process and categorize text tokens into three groups: image-positive, invariant, and negative, based on their visual dependence on input image tokens. Our analysis reveals that most generated tokens are minimally influenced by the image information. This suggests that during the model's training stage, more emphasis is placed on learning how to follow textual instructions, rather than extracting information from images. Based on this finding, we propose adjusting the training weights of different tokens depending on their visual dependence to control hallucination. Additionally, we remove a portion of the training data that potentially contains more hallucinations as a data filtering strategy. Both methods achieve a reduction in hallucination without compromising response length or introducing additional computational costs during inference. We validate our methods across three LVLM variants, demonstrating the effectiveness and general applicability.
LGNov 12, 2025
Hierarchical Schedule Optimization for Fast and Robust Diffusion Model SamplingAihua Zhu, Rui Su, Qinglin Zhao et al.
Diffusion probabilistic models have set a new standard for generative fidelity but are hindered by a slow iterative sampling process. A powerful training-free strategy to accelerate this process is Schedule Optimization, which aims to find an optimal distribution of timesteps for a fixed and small Number of Function Evaluations (NFE) to maximize sample quality. To this end, a successful schedule optimization method must adhere to four core principles: effectiveness, adaptivity, practical robustness, and computational efficiency. However, existing paradigms struggle to satisfy these principles simultaneously, motivating the need for a more advanced solution. To overcome these limitations, we propose the Hierarchical-Schedule-Optimizer (HSO), a novel and efficient bi-level optimization framework. HSO reframes the search for a globally optimal schedule into a more tractable problem by iteratively alternating between two synergistic levels: an upper-level global search for an optimal initialization strategy and a lower-level local optimization for schedule refinement. This process is guided by two key innovations: the Midpoint Error Proxy (MEP), a solver-agnostic and numerically stable objective for effective local optimization, and the Spacing-Penalized Fitness (SPF) function, which ensures practical robustness by penalizing pathologically close timesteps. Extensive experiments show that HSO sets a new state-of-the-art for training-free sampling in the extremely low-NFE regime. For instance, with an NFE of just 5, HSO achieves a remarkable FID of 11.94 on LAION-Aesthetics with Stable Diffusion v2.1. Crucially, this level of performance is attained not through costly retraining, but with a one-time optimization cost of less than 8 seconds, presenting a highly practical and efficient paradigm for diffusion model acceleration.
MMDec 12, 2024
Enhancing Modality Representation and Alignment for Multimodal Cold-start Active LearningMeng Shen, Yake Wei, Jianxiong Yin et al.
Training multimodal models requires a large amount of labeled data. Active learning (AL) aim to reduce labeling costs. Most AL methods employ warm-start approaches, which rely on sufficient labeled data to train a well-calibrated model that can assess the uncertainty and diversity of unlabeled data. However, when assembling a dataset, labeled data are often scarce initially, leading to a cold-start problem. Additionally, most AL methods seldom address multimodal data, highlighting a research gap in this field. Our research addresses these issues by developing a two-stage method for Multi-Modal Cold-Start Active Learning (MMCSAL). Firstly, we observe the modality gap, a significant distance between the centroids of representations from different modalities, when only using cross-modal pairing information as self-supervision signals. This modality gap affects data selection process, as we calculate both uni-modal and cross-modal distances. To address this, we introduce uni-modal prototypes to bridge the modality gap. Secondly, conventional AL methods often falter in multimodal scenarios where alignment between modalities is overlooked. Therefore, we propose enhancing cross-modal alignment through regularization, thereby improving the quality of selected multimodal data pairs in AL. Finally, our experiments demonstrate MMCSAL's efficacy in selecting multimodal data pairs across three multimodal datasets.
NIFeb 4, 2024
Empowering Computing and Networks Convergence System with Distributed Cooperative RoutingYujiao Hu, Qingmin Jia, Meng Shen et al.
The emergence of intelligent applications and recent advances in the fields of computing and networks are driving the development of computing and networks convergence (CNC) system. However, existing researches failed to achieve comprehensive scheduling optimization of computing and network resources. This shortfall results in some requirements of computing requests unable to be guaranteed in an end-to-end service pattern, negatively impacting the development of CNC systems. In this article, we propose a distributed cooperative routing framework for the CNC system to ensure the deadline requirements and minimize the computation cost of requests. The framework includes trading plane, management plane, control plane and forwarding plane. The cross-plane cooperative end-to-end routing schemes consider both computation efficiency of heterogeneous servers and the network congestion degrees while making routing plan, thereby determining where to execute requests and corresponding routing paths. Simulations results substantiates the performance of our routing schemes in scheduling computing requests in the CNC system.
CVMay 16, 2023
UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive LearningHeqing Zou, Meng Shen, Chen Chen et al.
Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for various downstream tasks. However, traditional aggregation-based multimodal fusion methods ignore the inter-modality relationship, treat each modality equally, suffer sensor noise, and thus reduce multimodal learning performance. In this work, we propose a novel multimodal contrastive method to explore more reliable multimodal representations under the weak supervision of unimodal predicting. Specifically, we first capture task-related unimodal representations and the unimodal predictions from the introduced unimodal predicting task. Then the unimodal representations are aligned with the more effective one by the designed multimodal contrastive method under the supervision of the unimodal predictions. Experimental results with fused features on two image-text classification benchmarks UPMC-Food-101 and N24News show that our proposed Unimodality-Supervised MultiModal Contrastive UniS-MMC learning method outperforms current state-of-the-art multimodal methods. The detailed ablation study and analysis further demonstrate the advantage of our proposed method.
CRJun 28, 2021
Realtime Robust Malicious Traffic Detection via Frequency Domain AnalysisChuanpu Fu, Qi Li, Meng Shen et al.
Machine learning (ML) based malicious traffic detection is an emerging security paradigm, particularly for zero-day attack detection, which is complementary to existing rule based detection. However, the existing ML based detection has low detection accuracy and low throughput incurred by inefficient traffic features extraction. Thus, they cannot detect attacks in realtime especially in high throughput networks. Particularly, these detection systems similar to the existing rule based detection can be easily evaded by sophisticated attacks. To this end, we propose Whisper, a realtime ML based malicious traffic detection system that achieves both high accuracy and high throughput by utilizing frequency domain features. It utilizes sequential features represented by the frequency domain features to achieve bounded information loss, which ensures high detection accuracy, and meanwhile constrains the scale of features to achieve high detection throughput. Particularly, attackers cannot easily interfere with the frequency domain features and thus Whisper is robust against various evasion attacks. Our experiments with 42 types of attacks demonstrate that, compared with the state-of-theart systems, Whisper can accurately detect various sophisticated and stealthy attacks, achieving at most 18.36% improvement, while achieving two orders of magnitude throughput. Even under various evasion attacks, Whisper is still able to maintain around 90% detection accuracy.
DCFeb 5, 2021
A Serverless Cloud-Fog Platform for DNN-Based Video Analytics with Incremental LearningHuaizheng Zhang, Meng Shen, Yizheng Huang et al.
DNN-based video analytics have empowered many new applications (e.g., automated retail). Meanwhile, the proliferation of fog devices provides developers with more design options to improve performance and save cost. To the best of our knowledge, this paper presents the first serverless system that takes full advantage of the client-fog-cloud synergy to better serve the DNN-based video analytics. Specifically, the system aims to achieve two goals: 1) Provide the optimal analytics results under the constraints of lower bandwidth usage and shorter round-trip time (RTT) by judiciously managing the computational and bandwidth resources deployed in the client, fog, and cloud environment. 2) Free developers from tedious administration and operation tasks, including DNN deployment, cloud and fog's resource management. To this end, we implement a holistic cloud-fog system referred to as VPaaS (Video-Platform-as-a-Service). VPaaS adopts serverless computing to enable developers to build a video analytics pipeline by simply programming a set of functions (e.g., model inference), which are then orchestrated to process videos through carefully designed modules. To save bandwidth and reduce RTT, VPaaS provides a new video streaming protocol that only sends low-quality video to the cloud. The state-of-the-art (SOTA) DNNs deployed at the cloud can identify regions of video frames that need further processing at the fog ends. At the fog ends, misidentified labels in these regions can be corrected using a light-weight DNN model. To address the data drift issues, we incorporate limited human feedback into the system to verify the results and adopt incremental learning to improve our system continuously. The evaluation demonstrates that VPaaS is superior to several SOTA systems: it maintains high accuracy while reducing bandwidth usage by up to 21%, RTT by up to 62.5%, and cloud monetary cost by up to 50%.
CVNov 27, 2020
Robust Attacks on Deep Learning Face Recognition in the Physical WorldMeng Shen, Hao Yu, Liehuang Zhu et al.
Deep neural networks (DNNs) have been increasingly used in face recognition (FR) systems. Recent studies, however, show that DNNs are vulnerable to adversarial examples, which can potentially mislead the FR systems using DNNs in the physical world. Existing attacks on these systems either generate perturbations working merely in the digital world, or rely on customized equipments to generate perturbations and are not robust in varying physical environments. In this paper, we propose FaceAdv, a physical-world attack that crafts adversarial stickers to deceive FR systems. It mainly consists of a sticker generator and a transformer, where the former can craft several stickers with different shapes and the latter transformer aims to digitally attach stickers to human faces and provide feedbacks to the generator to improve the effectiveness of stickers. We conduct extensive experiments to evaluate the effectiveness of FaceAdv on attacking 3 typical FR systems (i.e., ArcFace, CosFace and FaceNet). The results show that compared with a state-of-the-art attack, FaceAdv can significantly improve success rate of both dodging and impersonating attacks. We also conduct comprehensive evaluations to demonstrate the robustness of FaceAdv.
CRSep 21, 2020
Privacy-Preserving Machine Learning Training in Aggregation ScenariosLiehuang Zhu, Xiangyun Tang, Meng Shen et al.
To develop Smart City, the growing popularity of Machine Learning (ML) that appreciates high-quality training datasets generated from diverse IoT devices raises natural questions about the privacy guarantees that can be provided in such settings. Privacy-preserving ML training in an aggregation scenario enables a model demander to securely train ML models with the sensitive IoT data gathered from personal IoT devices. Existing solutions are generally server-aided, cannot deal with the collusion threat between the servers or between the servers and data owners, and do not match the delicate environments of IoT. We propose a privacy-preserving ML training framework named Heda that consists of a library of building blocks based on partial homomorphic encryption (PHE) enabling constructing multiple privacy-preserving ML training protocols for the aggregation scenario without the assistance of untrusted servers and defending the security under collusion situations. Rigorous security analysis demonstrates the proposed protocols can protect the privacy of each participant in the honest-but-curious model and defend the security under most collusion situations. Extensive experiments validate the efficiency of Heda which achieves the privacy-preserving ML training without losing the model accuracy.
CRAug 28, 2020
Bluetooth-based COVID-19 Proximity Tracing Proposals: An OverviewMeng Shen, Yaqian Wei, Tong Li
Large-scale COVID-19 infections have occurred worldwide, which has caused tremendous impact on the economy and people's lives. The traditional method for tracing contagious virus, for example, determining the infection chain according to the memory of infected people, has many drawbacks. With the continuous spread of the pandemic, many countries or organizations have started to study how to use mobile devices to trace COVID-19, aiming to help people automatically record information about incidents with infected people through technologies, reducing the manpower required to determine the infection chain and alerting people at risk of infection. This article gives an overview on various Bluetooth-based COVID-19 proximity tracing proposals including centralized and decentralized proposals. We discussed the basic workflow and the differences between them before providing a survey of five typical proposals with explanations of their design features and benefits. Then, we summarized eight security and privacy design goals for Bluetooth-based COVID-19 proximity tracing proposals and applied them to analyze the five proposals. Finally, open problems and future directions are discussed.
CRDec 6, 2018
When Homomorphic Cryptosystem Meets Differential Privacy: Training Machine Learning Classifier with Privacy ProtectionXiangyun Tang, Liehuang Zhu, Meng Shen et al.
Machine learning (ML) classifiers are invaluable building blocks that have been used in many fields. High quality training dataset collected from multiple data providers is essential to train accurate classifiers. However, it raises concern about data privacy due to potential leakage of sensitive information in training dataset. Existing studies have proposed many solutions to privacy-preserving training of ML classifiers, but it remains a challenging task to strike a balance among accuracy, computational efficiency, and security. In this paper, we propose Heda, an efficient privacypreserving scheme for training ML classifiers. By combining homomorphic cryptosystem (HC) with differential privacy (DP), Heda obtains the tradeoffs between efficiency and accuracy, and enables flexible switch among different tradeoffs by parameter tuning. In order to make such combination efficient and feasible, we present novel designs based on both HC and DP: A library of building blocks based on partially HC are proposed to construct complex training algorithms without introducing a trusted thirdparty or computational relaxation; A set of theoretical methods are proposed to determine appropriate privacy budget and to reduce sensitivity. Security analysis demonstrates that our solution can construct complex ML training algorithm securely. Extensive experimental results show the effectiveness and efficiency of the proposed scheme.
CRDec 5, 2018
Research on the Security of Blockchain Data: A SurveyLiehuang Zhu, Baokun Zheng, Meng Shen et al.
With the more and more extensive application of blockchain, blockchain security has been widely concerned by the society and deeply studied by scholars. Moreover, the security of blockchain data directly affects the security of various applications of blockchain. In this survey, we perform a comprehensive classification and summary of the security of blockchain data. First, we present classification of blockchain data attacks. Subsequently, we present the attacks and defenses of blockchain data in terms of privacy, availability, integrity and controllability. Data privacy attacks present data leakage or data obtained by attackers through analysis. Data availability attacks present abnormal or incorrect access to blockchain data. Data integrity attacks present blockchain data being tampered. Data controllability attacks present blockchain data accidentally manipulated by smart contract vulnerability. Finally, we present several important open research directions to identify follow-up studies in this area.
CROct 8, 2018
IriTrack: Liveness Detection Using Irises Tracking for Preventing Face Spoofing AttacksMeng Shen, Zelin Liao, Liehuang Zhu et al.
Face liveness detection has become a widely used technique with a growing importance in various authentication scenarios to withstand spoofing attacks. Existing methods that perform liveness detection generally focus on designing intelligent classifiers or customized hardware to differentiate between the image or video samples of a real legitimate user and the imitated ones. Although effective, they can be resource-consuming and detection results may be sensitive to environmental changes. In this paper, we take iris movement as a significant liveness sign and propose a simple and efficient liveness detection system named IriTrack. Users are required to move their eyes along with a randomly generated poly-line, and trajectories of irises are then used as evidences for liveness detection. IriTrack allows checking liveness by using data collected during user-device interactions. We implemented a prototype and conducted extensive experiments to evaluate the performance of the proposed system. The results show that IriTrack can fend against spoofing attacks with a moderate and adjustable time overhead.
CRSep 22, 2018
Content-Based Multi-Source Encrypted Image Retrieval in Clouds with Privacy PreservationMeng Shen, Guohua Cheng, Liehuang Zhu et al.
Content-based image retrieval (CBIR) is one of the fundamental image retrieval primitives. Its applications can be found in various areas, such as art collections and medical diagnoses. With an increasing prevalence of cloud computing paradigm, image owners desire to outsource their images to cloud servers. In order to deal with the risk of privacy leakage of images, images are typically encrypted before they are outsourced to the cloud, which makes CBIR an extremely challenging task. Existing studies focus on the scenario with only a single image owner, leaving the problem of CBIR with multiple image sources (i.e., owners) unaddressed. In this paper, we propose a secure CBIR scheme that supports Multiple Image owners with Privacy Protection (MIPP). We encrypt image features with a secure multi-party computation technique, which allows image owners to encrypt image features with their own keys. This enables efficient image retrieval over images gathered from multiple sources, while guaranteeing that image privacy of an individual image owner will not be leaked to other image owners. We also propose a new method for similarity measurement of images that can avoid revealing image similarity information to the cloud. Theoretical analysis and experimental results demonstrate that MIPP achieves retrieval accuracy and efficiency simultaneously, while preserving image privacy.
CRSep 21, 2018
Secure Phrase Search for Intelligent Processing of Encrypted Data in Cloud-Based IoTMeng Shen, Baoli Ma, Liehuang Zhu et al.
Phrase search allows retrieval of documents containing an exact phrase, which plays an important role in many machine learning applications for cloud-based IoT, such as intelligent medical data analytics. In order to protect sensitive information from being leaked by service providers, documents (e.g., clinic records) are usually encrypted by data owners before being outsourced to the cloud. This, however, makes the search operation an extremely challenging task. Existing searchable encryption schemes for multi-keyword search operations fail to perform phrase search, as they are unable to determine the location relationship of multiple keywords in a queried phrase over encrypted data on the cloud server side. In this paper, we propose P3, an efficient privacy-preserving phrase search scheme for intelligent encrypted data processing in cloud-based IoT. Our scheme exploits the homomorphic encryption and bilinear map to determine the location relationship of multiple queried keywords over encrypted data. It also utilizes a probabilistic trapdoor generation algorithm to protect users search patterns. Thorough security analysis demonstrates the security guarantees achieved by P3. We implement a prototype and conduct extensive experiments on real-world datasets. The evaluation results show that compared with existing multikeyword search schemes, P3 can greatly improve the search accuracy with moderate overheads.
CRSep 21, 2018
Cloud-Based Approximate Constrained Shortest Distance Queries Over Encrypted Graphs With Privacy ProtectionMeng Shen, Baoli Ma, Liehuang Zhu et al.
Constrained shortest distance (CSD) querying is one of the fundamental graph query primitives, which finds the shortest distance from an origin to a destination in a graph with a constraint that the total cost does not exceed a given threshold. CSD querying has a wide range of applications, such as routing in telecommunications and transportation. With an increasing prevalence of cloud computing paradigm, graph owners desire to outsource their graphs to cloud servers. In order to protect sensitive information, these graphs are usually encrypted before being outsourced to the cloud. This, however, imposes a great challenge to CSD querying over encrypted graphs. Since performing constraint filtering is an intractable task, existing work mainly focuses on unconstrained shortest distance queries. CSD querying over encrypted graphs remains an open research problem. In this paper, we propose Connor, a novel graph encryption scheme that enables approximate CSD querying. Connor is built based on an efficient, tree-based ciphertext comparison protocol, and makes use of symmetric-key primitives and the somewhat homomorphic encryption, making it computationally efficient. Using Connor, a graph owner can first encrypt privacy-sensitive graphs and then outsource them to the cloud server, achieving the necessary privacy without losing the ability of querying. Extensive experiments with real-world datasets demonstrate the effectiveness and efficiency of the proposed graph encryption scheme.