LGFeb 24, 2023
Active Membership Inference Attack under Local Differential Privacy in Federated LearningTruc Nguyen, Phung Lai, Khang Tran et al.
Federated learning (FL) was originally regarded as a framework for collaborative learning among clients with data privacy protection through a coordinating server. In this paper, we propose a new active membership inference (AMI) attack carried out by a dishonest server in FL. In AMI attacks, the server crafts and embeds malicious parameters into global models to effectively infer whether a target data sample is included in a client's private training data or not. By exploiting the correlation among data features through a non-linear decision boundary, AMI attacks with a certified guarantee of success can achieve severely high success rates under rigorous local differential privacy (LDP) protection; thereby exposing clients' training data to significant privacy risk. Theoretical and experimental results on several benchmark datasets show that adding sufficient privacy-preserving noise to prevent our attack would significantly damage FL's model utility.
LGDec 8, 2022
XRand: Differentially Private Defense against Explanation-Guided AttacksTruc Nguyen, Phung Lai, NhatHai Phan et al.
Recent development in the field of explainable artificial intelligence (XAI) has helped improve trust in Machine-Learning-as-a-Service (MLaaS) systems, in which an explanation is provided together with the model prediction in response to each query. However, XAI also opens a door for adversaries to gain insights into the black-box models in MLaaS, thereby making the models more vulnerable to several attacks. For example, feature-based explanations (e.g., SHAP) could expose the top important features that a black-box model focuses on. Such disclosure has been exploited to craft effective backdoor triggers against malware classifiers. To address this trade-off, we introduce a new concept of achieving local differential privacy (LDP) in the explanations, and from that we establish a defense, called XRand, against such attacks. We show that our mechanism restricts the information that the adversary can learn about the top important features, while maintaining the faithfulness of the explanations.
CRMay 11, 2022
Blockchain-based Secure Client Selection in Federated LearningTruc Nguyen, Phuc Thai, Tre' R. Jeter et al.
Despite the great potential of Federated Learning (FL) in large-scale distributed learning, the current system is still subject to several privacy issues due to the fact that local models trained by clients are exposed to the central server. Consequently, secure aggregation protocols for FL have been developed to conceal the local models from the server. However, we show that, by manipulating the client selection process, the server can circumvent the secure aggregation to learn the local models of a victim client, indicating that secure aggregation alone is inadequate for privacy protection. To tackle this issue, we leverage blockchain technology to propose a verifiable client selection protocol. Owing to the immutability and transparency of blockchain, our proposed protocol enforces a random selection of clients, making the server unable to control the selection process at its discretion. We present security proofs showing that our protocol is secure against this attack. Additionally, we conduct several experiments on an Ethereum-like blockchain to demonstrate the feasibility and practicality of our solution.
CRNov 23, 2023
OASIS: Offsetting Active Reconstruction Attacks in Federated LearningTre' R. Jeter, Truc Nguyen, Raed Alharbi et al.
Federated Learning (FL) has garnered significant attention for its potential to protect user privacy while enhancing model training efficiency. For that reason, FL has found its use in various domains, from healthcare to industrial engineering, especially where data cannot be easily exchanged due to sensitive information or privacy laws. However, recent research has demonstrated that FL protocols can be easily compromised by active reconstruction attacks executed by dishonest servers. These attacks involve the malicious modification of global model parameters, allowing the server to obtain a verbatim copy of users' private data by inverting their gradient updates. Tackling this class of attack remains a crucial challenge due to the strong threat model. In this paper, we propose a defense mechanism, namely OASIS, based on image augmentation that effectively counteracts active reconstruction attacks while preserving model performance. We first uncover the core principle of gradient inversion that enables these attacks and theoretically identify the main conditions by which the defense can be robust regardless of the attack strategies. We then construct our defense with image augmentation showing that it can undermine the attack principle. Comprehensive evaluations demonstrate the efficacy of the defense mechanism highlighting its feasibility as a solution.
CRSep 6, 2024
Detection of False Data Injection Attacks (FDIA) on Power Dynamical Systems With a State Prediction MethodAbhijeet Sahu, Truc Nguyen, Kejun Chen et al.
With the deeper penetration of inverter-based resources in power systems, false data injection attacks (FDIA) are a growing cyber-security concern. They have the potential to disrupt the system's stability like frequency stability, thereby leading to catastrophic failures. Therefore, an FDIA detection method would be valuable to protect power systems. FDIAs typically induce a discrepancy between the desired and the effective behavior of the power system dynamics. A suitable detection method can leverage power dynamics predictions to identify whether such a discrepancy was induced by an FDIA. This work investigates the efficacy of temporal and spatio-temporal state prediction models, such as Long Short-Term Memory (LSTM) and a combination of Graph Neural Networks (GNN) with LSTM, for predicting frequency dynamics in the absence of an FDIA but with noisy measurements, and thereby identify FDIA events. For demonstration purposes, the IEEE 39 New England Kron-reduced model simulated with a swing equation is considered. It is shown that the proposed state prediction models can be used as a building block for developing an effective FDIA detection method that can maintain high detection accuracy across various attack and deployment settings. It is also shown how the FDIA detection should be deployed to limit its exposure to detection inaccuracies and mitigate its computational burden.
5.0CLApr 2
Human-Guided Reasoning with Large Language Models for Vietnamese Speech Emotion RecognitionTruc Nguyen, Then Tran, Binh Truong et al.
Vietnamese Speech Emotion Recognition (SER) remains challenging due to ambiguous acoustic patterns and the lack of reliable annotated data, especially in real-world conditions where emotional boundaries are not clearly separable. To address this problem, this paper proposes a human-machine collaborative framework that integrates human knowledge into the learning process rather than relying solely on data-driven models. The proposed framework is centered around LLM-based reasoning, where acoustic feature-based models are used to provide auxiliary signals such as confidence and feature-level evidence. A confidence-based routing mechanism is introduced to distinguish between easy and ambiguous samples, allowing uncertain cases to be delegated to LLMs for deeper reasoning guided by structured rules derived from human annotation behavior. In addition, an iterative refinement strategy is employed to continuously improve system performance through error analysis and rule updates. Experiments are conducted on a Vietnamese speech dataset of 2,764 samples across three emotion classes (calm, angry, panic), with high inter-annotator agreement (Fleiss Kappa = 0.8574), ensuring reliable ground truth. The proposed method achieves strong performance, reaching up to 86.59% accuracy and Macro F1 around 0.85-0.86, demonstrating its effectiveness in handling ambiguous and hard-to-classify cases. Overall, this work highlights the importance of combining data-driven models with human reasoning, providing a robust and model-agnostic approach for speech emotion recognition in low-resource settings.
LGSep 13, 2024
XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature SubstitutionKiana Vu, Phung Lai, Truc Nguyen
Despite its significant benefits in enhancing the transparency and trustworthiness of artificial intelligence (AI) systems, explainable AI (XAI) has yet to reach its full potential in real-world applications. One key challenge is that XAI can unintentionally provide adversaries with insights into black-box models, inevitably increasing their vulnerability to various attacks. In this paper, we develop a novel explanation-driven adversarial attack against black-box classifiers based on feature substitution, called XSub. The key idea of XSub is to strategically replace important features (identified via XAI) in the original sample with corresponding important features from a "golden sample" of a different label, thereby increasing the likelihood of the model misclassifying the perturbed sample. The degree of feature substitution is adjustable, allowing us to control how much of the original samples information is replaced. This flexibility effectively balances a trade-off between the attacks effectiveness and its stealthiness. XSub is also highly cost-effective in that the number of required queries to the prediction model and the explanation model in conducting the attack is in O(1). In addition, XSub can be easily extended to launch backdoor attacks in case the attacker has access to the models training data. Our evaluation demonstrates that XSub is not only effective and stealthy but also cost-effective, enabling its application across a wide range of AI models.
CRMar 2, 2024
Analysis of Privacy Leakage in Federated Large Language ModelsMinh N. Vu, Truc Nguyen, Tre' R. Jeter et al.
With the rapid adoption of Federated Learning (FL) as the training and tuning protocol for applications utilizing Large Language Models (LLMs), recent research highlights the need for significant modifications to FL to accommodate the large-scale of LLMs. While substantial adjustments to the protocol have been introduced as a response, comprehensive privacy analysis for the adapted FL protocol is currently lacking. To address this gap, our work delves into an extensive examination of the privacy analysis of FL when used for training LLMs, both from theoretical and practical perspectives. In particular, we design two active membership inference attacks with guaranteed theoretical success rates to assess the privacy leakages of various adapted FL configurations. Our theoretical findings are translated into practical attacks, revealing substantial privacy vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and OpenAI's GPTs, across multiple real-world language datasets. Additionally, we conduct thorough experiments to evaluate the privacy leakage of these models when data is protected by state-of-the-art differential privacy (DP) mechanisms.
CRJun 16, 2025
Theoretically Unmasking Inference Attacks Against LDP-Protected Clients in Federated Vision ModelsQuan Nguyen, Minh N. Vu, Truc Nguyen et al.
Federated Learning enables collaborative learning among clients via a coordinating server while avoiding direct data sharing, offering a perceived solution to preserve privacy. However, recent studies on Membership Inference Attacks (MIAs) have challenged this notion, showing high success rates against unprotected training data. While local differential privacy (LDP) is widely regarded as a gold standard for privacy protection in data analysis, most studies on MIAs either neglect LDP or fail to provide theoretical guarantees for attack success rates against LDP-protected data. To address this gap, we derive theoretical lower bounds for the success rates of low-polynomial time MIAs that exploit vulnerabilities in fully connected or self-attention layers. We establish that even when data are protected by LDP, privacy risks persist, depending on the privacy budget. Practical evaluations on federated vision models confirm considerable privacy risks, revealing that the noise required to mitigate these attacks significantly degrades models' utility.
MLMar 9, 2025
Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba ModelsNguyen Do, Truc Nguyen, Malik Hassanaly et al.
Despite a plethora of anomaly detection models developed over the years, their ability to generalize to unseen anomalies remains an issue, particularly in critical systems. This paper aims to address this challenge by introducing Swift Hydra, a new framework for training an anomaly detection method based on generative AI and reinforcement learning (RL). Through featuring an RL policy that operates on the latent variables of a generative model, the framework synthesizes novel and diverse anomaly samples that are capable of bypassing a detection model. These generated synthetic samples are, in turn, used to augment the detection model, further improving its ability to handle challenging anomalies. Swift Hydra also incorporates Mamba models structured as a Mixture of Experts (MoE) to enable scalable adaptation of the number of Mamba experts based on data complexity, effectively capturing diverse feature distributions without increasing the model's inference time. Empirical evaluations on ADBench benchmark demonstrate that Swift Hydra outperforms other state-of-the-art anomaly detection models while maintaining a relatively short inference time. From these results, our research highlights a new and auspicious paradigm of integrating RL and generative AI for advancing anomaly detection.
LGFeb 7, 2022
Preserving Privacy and Security in Federated LearningTruc Nguyen, My T. Thai
Federated learning is known to be vulnerable to both security and privacy issues. Existing research has focused either on preventing poisoning attacks from users or on concealing the local model updates from the server, but not both. However, integrating these two lines of research remains a crucial challenge since they often conflict with one another with respect to the threat model. In this work, we develop a principle framework that offers both privacy guarantees for users and detection against poisoning attacks from them. With a new threat model that includes both an honest-but-curious server and malicious users, we first propose a secure aggregation protocol using homomorphic encryption for the server to combine local model updates in a private manner. Then, a zero-knowledge proof protocol is leveraged to shift the task of detecting attacks in the local models from the server to the users. The key observation here is that the server no longer needs access to the local models for attack detection. Therefore, our framework enables the central server to identify poisoned model updates without violating the privacy guarantees of secure aggregation.
ASAug 4, 2021
Lung Sound Classification Using Co-tuning and Stochastic NormalizationTruc Nguyen, Franz Pernkopf
In this paper, we use pre-trained ResNet models as backbone architectures for classification of adventitious lung sounds and respiratory diseases. The knowledge of the pre-trained model is transferred by using vanilla fine-tuning, co-tuning, stochastic normalization and the combination of the co-tuning and stochastic normalization techniques. Furthermore, data augmentation in both time domain and time-frequency domain is used to account for the class imbalance of the ICBHI and our multi-channel lung sound dataset. Additionally, we apply spectrum correction to consider the variations of the recording device properties on the ICBHI dataset. Empirically, our proposed systems mostly outperform all state-of-the-art lung sound classification systems for the adventitious lung sounds and respiratory diseases of both datasets.
ASApr 30, 2021
Crackle Detection In Lung Sounds Using Transfer Learning And Multi-Input Convolitional Neural NetworksTruc Nguyen, Franz Pernkopf
Large annotated lung sound databases are publicly available and might be used to train algorithms for diagnosis systems. However, it might be a challenge to develop a well-performing algorithm for small non-public data, which have only a few subjects and show differences in recording devices and setup. In this paper, we use transfer learning to tackle the mismatch of the recording setup. This allows us to transfer knowledge from one dataset to another dataset for crackle detection in lung sounds. In particular, a single input convolutional neural network (CNN) model is pre-trained on a source domain using ICBHI 2017, the largest publicly available database of lung sounds. We use log-mel spectrogram features of respiratory cycles of lung sounds. The pre-trained network is used to build a multi-input CNN model, which shares the same network architecture for respiratory cycles and their corresponding respiratory phases. The multi-input model is then fine-tuned on the target domain of our self-collected lung sound database for classifying crackles and normal lung sounds. Our experimental results show significant performance improvements of 9.84% (absolute) in F-score on the target domain using the multi-input CNN model based on transfer learning for crackle detection in adventitious lung sound classification task.
CRJul 16, 2020
Denial-of-Service Vulnerability of Hash-based Transaction Sharding: Attack and CountermeasureTruc Nguyen, My T. Thai
Since 2016, sharding has become an auspicious solution to tackle the scalability issue in legacy blockchain systems. Despite its potential to strongly boost the blockchain throughput, sharding comes with its own security issues. To ease the process of deciding which shard to place transactions, existing sharding protocols use a hash-based transaction sharding in which the hash value of a transaction determines its output shard. Unfortunately, we show that this mechanism opens up a loophole that could be exploited to conduct a single-shard flooding attack, a type of Denial-of-Service (DoS) attack, to overwhelm a single shard that ends up reducing the performance of the system as a whole. To counter the single-shard flooding attack, we propose a countermeasure that essentially eliminates the loophole by rejecting the use of hash-based transaction sharding. The countermeasure leverages the Trusted Execution Environment (TEE) to let blockchain's validators securely execute a transaction sharding algorithm with a negligible overhead. We provide a formal specification for the countermeasure and analyze its security properties in the Universal Composability (UC) framework. Finally, a proof-of-concept is developed to demonstrate the feasibility and practicality of our solution.
CRJul 16, 2020
OptChain: Optimal Transactions Placement for Scalable Blockchain ShardingLan N. Nguyen, Truc Nguyen, Thang N. Dinh et al.
A major challenge in blockchain sharding protocols is that more than 95% transactions are cross-shard. Not only those cross-shard transactions degrade the system throughput but also double the confirmation time, and exhaust an already scarce network bandwidth. Are cross-shard transactions imminent for sharding schemes? In this paper, we propose a new sharding paradigm, called OptChain, in which cross-shard transactions are minimized, resulting in almost twice faster confirmation time and throughput. By treating transactions as a stream of nodes in an online graph, OptChain utilizes a lightweight and on-the-fly transaction placement method to group both related and soon-related transactions into the same shards. At the same time, OptChain maintains a temporal balance among shards to guarantee the high parallelism. Our comprehensive and large-scale simulation using Oversim P2P library confirms a significant boost in performance with up to 10 folds reduction in cross-shard transactions, more than twice reduction in confirmation time, and 50% increase in throughput. When combined with Omniledger sharding protocol, OptChain delivers a 6000 transactions per second throughput with 10.5s confirmation time.
SDFeb 11, 2020
Robust Acoustic Scene Classification using a Multi-Spectrogram Encoder-Decoder FrameworkLam Pham, Huy Phan, Truc Nguyen et al.
This article proposes an encoder-decoder network model for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. We make use of multiple low-level spectrogram features at the front-end, transformed into higher level features through a well-trained CNN-DNN front-end encoder. The high level features and their combination (via a trained feature combiner) are then fed into different decoder models comprising random forest regression, DNNs and a mixture of experts, for back-end classification. We report extensive experiments to evaluate the accuracy of this framework for various ASC datasets, including LITIS Rouen and IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Task 1, 2017 Task 1, 2018 Tasks 1A & 1B and 2019 Tasks 1A & 1B. The experimental results highlight two main contributions; the first is an effective method for high-level feature extraction from multi-spectrogram input via the novel C-DNN architecture encoder network, and the second is the proposed decoder which enables the framework to achieve competitive results on various datasets. The fact that a single framework is highly competitive for several different challenges is an indicator of its robustness for performing general ASC tasks.
SDJan 21, 2020
Robust Deep Learning Framework For Predicting Respiratory Anomalies and DiseasesLam Pham, Ian McLoughlin, Huy Phan et al.
This paper presents a robust deep learning framework developed to detect respiratory diseases from recordings of respiratory sounds. The complete detection process firstly involves front end feature extraction where recordings are transformed into spectrograms that convey both spectral and temporal information. Then a back-end deep learning model classifies the features into classes of respiratory disease or anomaly. Experiments, conducted over the ICBHI benchmark dataset of respiratory sounds, evaluate the ability of the framework to classify sounds. Two main contributions are made in this paper. Firstly, we provide an extensive analysis of how factors such as respiratory cycle length, time resolution, and network architecture, affect final prediction accuracy. Secondly, a novel deep learning based framework is proposed for detection of respiratory diseases and shown to perform extremely well compared to state of the art methods.