Binyan Zhang

2.7LGJan 14

$D^2Prune$: Sparsifying Large Language Models via Dual Taylor Expansion and Attention Distribution Awareness

Lang Xiong, Ning Liu, Ao Ren et al.

Large language models (LLMs) face significant deployment challenges due to their massive computational demands. % While pruning offers a promising compression solution, existing methods suffer from two critical limitations: (1) They neglect activation distribution shifts between calibration data and test data, resulting in inaccurate error estimations; (2) They overlook the long-tail distribution characteristics of activations in the attention module. To address these limitations, this paper proposes a novel pruning method, $D^2Prune$. First, we propose a dual Taylor expansion-based method that jointly models weight and activation perturbations for precise error estimation, leading to precise pruning mask selection and weight updating and facilitating error minimization during pruning. % Second, we propose an attention-aware dynamic update strategy that preserves the long-tail attention pattern by jointly minimizing the KL divergence of attention distributions and the reconstruction error. Extensive experiments show that $D^2Prune$ consistently outperforms SOTA methods across various LLMs (e.g., OPT-125M, LLaMA2/3, and Qwen3). Moreover, the dynamic attention update mechanism also generalizes well to ViT-based vision models like DeiT, achieving superior accuracy on ImageNet-1K.

4.1LGApr 25, 2025

A Generative Graph Contrastive Learning Model with Global Signal

Xiaofan Wei, Binyan Zhang

Graph contrastive learning (GCL) has garnered significant attention recently since it learns complex structural information from graphs through self-supervised learning manner. However, prevalent GCL models may suffer from performance degradation due to inappropriate contrastive signals. Concretely, they commonly generate augmented views based on random perturbation, which leads to biased essential structures due to the introduction of noise. In addition, they assign equal weight to both hard and easy sample pairs, thereby ignoring the difference in importance of the sample pairs. To address these issues, this study proposes a novel Contrastive Signal Generative Framework for Accurate Graph Learning (CSG2L) with the following two-fold ideas: a) building a singular value decomposition (SVD)-directed augmented module (SVD-aug) to obtain the global interactions as well as avoiding the random noise perturbation; b) designing a local-global dependency learning module (LGDL) with an adaptive reweighting strategy which can differentiate the effects of hard and easy sample pairs. Extensive experiments on benchmark datasets demonstrate that the proposed CSG2L outperforms the state-of-art baselines. Moreover, CSG2L is compatible with a variety of GNNs.

Binyan Zhang

2 Papers