Miao Zhang

LG
h-index23
3papers
12citations
Novelty65%
AI Score44

3 Papers

15.7LGFeb 18, 2025Code
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models

Jiaqi Zhao, Miao Zhang, Ming Wang et al.

Large Language Models (LLMs) suffer severe performance degradation when facing extremely low-bit (sub 2-bit) quantization. Several existing sub 2-bit post-training quantization (PTQ) methods utilize a mix-precision scheme by leveraging an unstructured fine-grained mask to explicitly distinguish salient weights, while which introduces an extra 1-bit or more per weight. To explore the real limit of PTQ, we propose an extremely low-bit PTQ method called PTQ1.61, which enables weight quantization to 1.61-bit for the first time. Specifically, we first introduce a one-dimensional structured mask with negligibly additional 0.0002-bit per weight based on input activations from the perspective of reducing the upper bound of quantization error to allocate corresponding salient weight channels to 4-bit. For non-salient channels binarization, an efficient block-wise scaling factors optimization framework is then presented to take implicit row-wise correlations and angular biases into account. Different from prior works that concentrate on adjusting quantization methodologies, we further propose a novel paradigm called quantization preprocessing, where we argue that transforming the weight distribution of the pretrained model before quantization can alleviate the difficulty in per-channel extremely low-bit PTQ. Extensive experiments indicate our PTQ1.61 achieves state-of-the-art performance in extremely low-bit quantization. Codes are available at https://github.com/zjq0455/PTQ1.61.

11.4LGMay 28, 2025
SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting

Haomiao Qiu, Miao Zhang, Ziyue Qiao et al.

Continual Learning requires a model to learn multiple tasks in sequence while maintaining both stability:preserving knowledge from previously learned tasks, and plasticity:effectively learning new tasks. Gradient projection has emerged as an effective and popular paradigm in CL, where it partitions the gradient space of previously learned tasks into two orthogonal subspaces: a primary subspace and a minor subspace. New tasks are learned effectively within the minor subspace, thereby reducing interference with previously acquired knowledge. However, existing Gradient Projection methods struggle to achieve an optimal balance between plasticity and stability, as it is hard to appropriately partition the gradient space. In this work, we consider a continual learning paradigm based on Low-Rank Adaptation, which has gained considerable attention due to its efficiency and wide applicability, and propose a novel approach for continual learning, called SplitLoRA. We first provide a theoretical analysis of how subspace partitioning affects model stability and plasticity. Informed by this analysis, we then introduce an effective method that derives the optimal partition of the gradient space for previously learned tasks. This approach effectively balances stability and plasticity in continual learning. Experimental results on multiple datasets demonstrate that the proposed method achieves state-of-the-art performance.

1.0LGJan 2, 2019Code
Multi-level CNN for lung nodule classification with Gaussian Process assisted hyperparameter optimization

Miao Zhang, Huiqi Li, Juan Lyu et al.

This paper investigates lung nodule classification by using deep neural networks (DNNs). Hyperparameter optimization in DNNs is a computationally expensive problem, where evaluating a hyperparameter configuration may take several hours or even days. Bayesian optimization has been recently introduced for the automatically searching of optimal hyperparameter configurations of DNNs. It applies probabilistic surrogate models to approximate the validation error function of hyperparameter configurations, such as Gaussian processes, and reduce the computational complexity to a large extent. However, most existing surrogate models adopt stationary covariance functions to measure the difference between hyperparameter points based on spatial distance without considering its spatial locations. This distance-based assumption together with the condition of constant smoothness throughout the whole hyperparameter search space clearly violates the property that the points far away from optimal points usually get similarly poor performance even though each two of them have huge spatial distance between them. In this paper, a non-stationary kernel is proposed which allows the surrogate model to adapt to functions whose smoothness varies with the spatial location of inputs, and a multi-level convolutional neural network (ML-CNN) is built for lung nodule classification whose hyperparameter configuration is optimized by using the proposed non-stationary kernel based Gaussian surrogate model. Our algorithm searches the surrogate for optimal setting via hyperparameter importance based evolutionary strategy, and the experiments demonstrate our algorithm outperforms manual tuning and well-established hyperparameter optimization methods such as Random search, Gaussian processes with stationary kernels, and recently proposed Hyperparameter Optimization via RBF and Dynamic coordinate search.