Huiwen Wu

LG
h-index5
8papers
246citations
Novelty49%
AI Score39

8 Papers

NAJun 8, 2018
A Preconditioner based on Non-uniform Row Sampling for Linear Least Squares Problems

Long Chen, Huiwen Wu

Least squares method is one of the simplest and most popular techniques applied in data fitting, imaging processing and high dimension data analysis. The classic methods like QR and SVD decomposition for solving least squares problems has a large computational cost. Iterative methods such as CG and Kaczmarz can reduce the complexity if the matrix is well conditioned but failed for the ill conditioned cases. Preconditioner based on randomized row sampling algorithms have been developed but destroy the sparsity. In this paper, a new preconditioner is constructed by non-uniform row sampling with a probability proportional to the squared norm of rows. Then Gauss Seidel iterations are applied to the normal equation of the sampled matrix which aims to grab the high frequency component of solution. After all, PCG is used to solve the normal equation with this preconditioner. Our preconditioner can keep the sparsity and improve the poor conditioning for highly overdetermined matrix. Experimental studies are presented on several different simulations including dense Gaussian matrix, `semi Gaussian' matrix, sparse random matrix, `UDV' matrix, and random graph Laplacian matrix to show the effectiveness of the proposed least square solver.

CLOct 16, 2024
Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning

Huiwen Wu, Xiaohan Li, Xiaogang Xu et al.

The development of Large Language Models (LLMs) has significantly advanced various AI applications in commercial and scientific research fields, such as scientific literature summarization, writing assistance, and knowledge graph construction. However, a significant challenge is the high risk of hallucination during LLM inference, which can lead to security concerns like factual inaccuracies, inconsistent information, and fabricated content. To tackle this issue, it is essential to develop effective methods for reducing hallucination while maintaining the original capabilities of the LLM. This paper introduces a novel approach called Iterative Model-level Contrastive Learning (Iter-AHMCL) to address hallucination. This method modifies the representation layers of pre-trained LLMs by using contrastive `positive' and `negative' models, trained on data with and without hallucinations. By leveraging the differences between these two models, we create a more straightforward pathway to eliminate hallucinations, and the iterative nature of contrastive learning further enhances performance. Experimental validation on four pre-trained foundation LLMs (LLaMA2, Alpaca, LLaMA3, and Qwen) finetuning with a specially designed dataset shows that our approach achieves an average improvement of 10.1 points on the TruthfulQA benchmark. Comprehensive experiments demonstrate the effectiveness of Iter-AHMCL in reducing hallucination while maintaining the general capabilities of LLMs.

LGMay 22, 2024
CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models

Huiwen Wu, Xiaogang Xu, Deyi Zhang et al.

The success of current Large-Language Models (LLMs) hinges on extensive training data that is collected and stored centrally, called Centralized Learning (CL). However, such a collection manner poses a privacy threat, and one potential solution is Federated Learning (FL), which transfers gradients, not raw data, among clients. Unlike traditional networks, FL for LLMs incurs significant communication costs due to their tremendous parameters. This study introduces an innovative approach to compress gradients to improve communication efficiency during LLM FL, formulating the new FL pipeline named CG-FedLLM. This approach integrates an encoder on the client side to acquire the compressed gradient features and a decoder on the server side to reconstruct the gradients. We also developed a novel training strategy that comprises Temporal-ensemble Gradient-Aware Pre-training (TGAP) to identify characteristic gradients of the target model and Federated AutoEncoder-Involved Fine-tuning (FAF) to compress gradients adaptively. Extensive experiments confirm that our approach reduces communication costs and improves performance (e.g., average 3 points increment compared with traditional CL- and FL-based fine-tuning with LlaMA on a well-recognized benchmark, C-Eval). This improvement is because our encoder-decoder, trained via TGAP and FAF, can filter gradients while selectively preserving critical features. Furthermore, we present a series of experimental analyses focusing on the signal-to-noise ratio, compression rate, and robustness within this privacy-centric framework, providing insight into developing more efficient and secure LLMs.

LGDec 22, 2024
DR-Encoder: Encode Low-rank Gradients with Random Prior for Large Language Models Differentially Privately

Huiwen Wu, Deyi Zhang, Xiaohan Li et al.

The emergence of the Large Language Model (LLM) has shown their superiority in a wide range of disciplines, including language understanding and translation, relational logic reasoning, and even partial differential equations solving. The transformer is the pervasive backbone architecture for the foundation model construction. It is vital to research how to adjust the Transformer architecture to achieve an end-to-end privacy guarantee in LLM fine-tuning. In this paper, we investigate three potential information leakage during a federated fine-tuning procedure for LLM (FedLLM). Based on the potential information leakage, we provide an end-to-end privacy guarantee solution for FedLLM by inserting two-stage randomness. The first stage is to train a gradient auto-encoder with a Gaussian random prior based on the statistical information of the gradients generated by local clients. The second stage is to fine-tune the overall LLM with a differential privacy guarantee by adopting appropriate Gaussian noises. We show the efficiency and accuracy gains of our proposed method with several foundation models and two popular evaluation benchmarks. Furthermore, we present a comprehensive privacy analysis with Gaussian Differential Privacy (GDP) and Renyi Differential Privacy (RDP).

CVNov 17, 2025
Synergizing Multigrid Algorithms with Vision Transformer: A Novel Approach to Enhance the Seismic Foundation Model

Huiwen Wu, Shuo Zhang, Yi Liu et al.

Due to the emergency and homogenization of Artificial Intelligence (AI) technology development, transformer-based foundation models have revolutionized scientific applications, such as drug discovery, materials research, and astronomy. However, seismic data presents unique characteristics that require specialized processing techniques for pretraining foundation models in seismic contexts with high- and low-frequency features playing crucial roles. Existing vision transformers (ViTs) with sequential tokenization ignore the intrinsic pattern and fail to grasp both the high- and low-frequency seismic information efficiently and effectively. This work introduces a novel adaptive two-grid foundation model training strategy (ADATG) with Hilbert encoding specifically tailored for seismogram data, leveraging the hierarchical structures inherent in seismic data. Specifically, our approach employs spectrum decomposition to separate high- and low-frequency components and utilizes hierarchical Hilbert encoding to represent the data effectively. Moreover, observing the frequency principle observed in ViTs, we propose an adaptive training strategy that initially emphasizes coarse-level information and then progressively refines the model's focus on fine-level features. Our extensive experiments demonstrate the effectiveness and efficiency of our training methods. This research highlights the importance of data encoding and training strategies informed by the distinct characteristics of high- and low-frequency features in seismic images, ultimately contributing to the enhancement of visual seismic foundation models pretraining.

LGFeb 10, 2022
Differential Private Knowledge Transfer for Privacy-Preserving Cross-Domain Recommendation

Chaochao Chen, Huiwen Wu, Jiajie Su et al.

Cross Domain Recommendation (CDR) has been popularly studied to alleviate the cold-start and data sparsity problem commonly existed in recommender systems. CDR models can improve the recommendation performance of a target domain by leveraging the data of other source domains. However, most existing CDR models assume information can directly 'transfer across the bridge', ignoring the privacy issues. To solve the privacy concern in CDR, in this paper, we propose a novel two stage based privacy-preserving CDR framework (PriCDR). In the first stage, we propose two methods, i.e., Johnson-Lindenstrauss Transform (JLT) based and Sparse-awareJLT (SJLT) based, to publish the rating matrix of the source domain using differential privacy. We theoretically analyze the privacy and utility of our proposed differential privacy based rating publishing methods. In the second stage, we propose a novel heterogeneous CDR model (HeteroCDR), which uses deep auto-encoder and deep neural network to model the published source rating matrix and target rating matrix respectively. To this end, PriCDR can not only protect the data privacy of the source domain, but also alleviate the data sparsity of the source domain. We conduct experiments on two benchmark datasets and the results demonstrate the effectiveness of our proposed PriCDR and HeteroCDR.

LGNov 14, 2020
A Theoretical Perspective on Differentially Private Federated Multi-task Learning

Huiwen Wu, Cen Chen, Li Wang

In the era of big data, the need to expand the amount of data through data sharing to improve model performance has become increasingly compelling. As a result, effective collaborative learning models need to be developed with respect to both privacy and utility concerns. In this work, we propose a new federated multi-task learning method for effective parameter transfer with differential privacy to protect gradients at the client level. Specifically, the lower layers of the networks are shared across all clients to capture transferable feature representation, while top layers of the network are task-specific for on-client personalization. Our proposed algorithm naturally resolves the statistical heterogeneity problem in federated networks. We are, to the best of knowledge, the first to provide both privacy and utility guarantees for such a proposed federated algorithm. The convergences are proved for the cases with Lipschitz smooth objective functions under the non-convex, convex, and strongly convex settings. Empirical experiment results on different datasets have been conducted to demonstrate the effectiveness of the proposed algorithm and verify the implications of the theoretical findings.

LGMay 25, 2020
Vertically Federated Graph Neural Network for Privacy-Preserving Node Classification

Chaochao Chen, Jun Zhou, Longfei Zheng et al.

Recently, Graph Neural Network (GNN) has achieved remarkable progresses in various real-world tasks on graph data, consisting of node features and the adjacent information between different nodes. High-performance GNN models always depend on both rich features and complete edge information in graph. However, such information could possibly be isolated by different data holders in practice, which is the so-called data isolation problem. To solve this problem, in this paper, we propose VFGNN, a federated GNN learning paradigm for privacy-preserving node classification task under data vertically partitioned setting, which can be generalized to existing GNN models. Specifically, we split the computation graph into two parts. We leave the private data (i.e., features, edges, and labels) related computations on data holders, and delegate the rest of computations to a semi-honest server. We also propose to apply differential privacy to prevent potential information leakage from the server. We conduct experiments on three benchmarks and the results demonstrate the effectiveness of VFGNN.