Morris Chang

LG
h-index2
5papers
51citations
Novelty51%
AI Score28

5 Papers

LGJan 5, 2024
Federated Learning for distribution skewed data using sample weights

Hung Nguyen, Peiyuan Wu, Morris Chang

One of the most challenging issues in federated learning is that the data is often not independent and identically distributed (nonIID). Clients are expected to contribute the same type of data and drawn from one global distribution. However, data are often collected in different ways from different resources. Thus, the data distributions among clients might be different from the underlying global distribution. This creates a weight divergence issue and reduces federated learning performance. This work focuses on improving federated learning performance for skewed data distribution across clients. The main idea is to adjust the client distribution closer to the global distribution using sample weights. Thus, the machine learning model converges faster with higher accuracy. We start from the fundamental concept of empirical risk minimization and theoretically derive a solution for adjusting the distribution skewness using sample weights. To determine sample weights, we implicitly exchange density information by leveraging a neural network-based density estimation model, MADE. The clients data distribution can then be adjusted without exposing their raw data. Our experiment results on three real-world datasets show that the proposed method not only improves federated learning accuracy but also significantly reduces communication costs compared to the other experimental methods.

LGJan 5, 2024
Synthetic Information towards Maximum Posterior Ratio for deep learning on Imbalanced Data

Hung Nguyen, Morris Chang

This study examines the impact of class-imbalanced data on deep learning models and proposes a technique for data balancing by generating synthetic data for the minority class. Unlike random-based oversampling, our method prioritizes balancing the informative regions by identifying high entropy samples. Generating well-placed synthetic data can enhance machine learning algorithms accuracy and efficiency, whereas poorly-placed ones may lead to higher misclassification rates. We introduce an algorithm that maximizes the probability of generating a synthetic sample in the correct region of its class by optimizing the class posterior ratio. Additionally, to maintain data topology, synthetic data are generated within each minority sample's neighborhood. Our experimental results on forty-one datasets demonstrate the superior performance of our technique in enhancing deep-learning models.

LGApr 8, 2025
Exploiting Meta-Learning-based Poisoning Attacks for Graph Link Prediction

Mingchen Li, Di Zhuang, Keyu Chen et al.

Link prediction in graph data uses various algorithms and Graph Nerual Network (GNN) models to predict potential relationships between graph nodes. These techniques have found widespread use in numerous real-world applications, including recommendation systems, community/social networks, and biological structures. However, recent research has highlighted the vulnerability of GNN models to adversarial attacks, such as poisoning and evasion attacks. Addressing the vulnerability of GNN models is crucial to ensure stable and robust performance in GNN applications. Although many works have focused on enhancing the robustness of node classification on GNN models, the robustness of link prediction has received less attention. To bridge this gap, this article introduces an unweighted graph poisoning attack that leverages meta-learning with weighted scheme strategies to degrade the link prediction performance of GNNs. We conducted comprehensive experiments on diverse datasets across multiple link prediction applications to evaluate the proposed method and its parameters, comparing it with existing approaches under similar conditions. Our results demonstrate that our approach significantly reduces link prediction performance and consistently outperforms other state-of-the-art baselines.

LGNov 9, 2021
Complementary Ensemble Learning

Hung Nguyen, Morris Chang

To achieve high performance of a machine learning (ML) task, a deep learning-based model must implicitly capture the entire distribution from data. Thus, it requires a huge amount of training samples, and data are expected to fully present the real distribution, especially for high dimensional data, e.g., images, videos. In practice, however, data are usually collected with a diversity of styles, and several of them have insufficient number of representatives. This might lead to uncertainty in models' prediction, and significantly reduce ML task performance. In this paper, we provide a comprehensive study on this problem by looking at model uncertainty. From this, we derive a simple but efficient technique to improve performance of state-of-the-art deep learning models. Specifically, we train auxiliary models which are able to complement state-of-the-art model uncertainty. As a result, by assembling these models, we can significantly improve the ML task performance for types of data mentioned earlier. While slightly improving ML classification accuracy on benchmark datasets (e.g., 0.2% on MNIST), our proposed method significantly improves on limited data (i.e., 1.3% on Eardrum and 3.5% on ChestXray).

CRFeb 27, 2019
AutoGAN-based Dimension Reduction for Privacy Preservation

Hung Nguyen, Di Zhuang, Pei-Yuan Wu et al.

Protecting sensitive information against data exploiting attacks is an emerging research area in data mining. Over the past, several different methods have been introduced to protect individual privacy from such attacks while maximizing data-utility of the application. However, these existing techniques are not sufficient to effectively protect data owner privacy, especially in the scenarios that utilize visualizable data (e.g. images, videos) or the applications that require heavy computations for implementation. To address these problems, we propose a new dimension reduction-based method for privacy preservation. Our method generates dimension-reduced data for performing machine learning tasks and prevents a strong adversary from reconstructing the original data. We first introduce a theoretical approach to evaluate dimension reduction-based privacy preserving mechanisms, then propose a non-linear dimension reduction framework motivated by state-of-the-art neural network structures for privacy preservation. We conducted experiments over three different face image datasets (AT&T, YaleB, and CelebA), and the results show that when the number of dimensions is reduced to seven, we can achieve the accuracies of 79%, 80%, and 73% respectively and the reconstructed images are not recognizable to naked human eyes.