Zhongcheng Li

2papers

2 Papers

CVFeb 13Code
QuEPT: Quantized Elastic Precision Transformers with One-Shot Calibration for Multi-Bit Switching

Ke Xu, Yixin Wang, Zhongcheng Li et al.

Elastic precision quantization enables multi-bit deployment via a single optimization pass, fitting diverse quantization scenarios.Yet, the high storage and optimization costs associated with the Transformer architecture, research on elastic quantization remains limited, particularly for large language models.This paper proposes QuEPT, an efficient post-training scheme that reconstructs block-wise multi-bit errors with one-shot calibration on a small data slice. It can dynamically adapt to various predefined bit-widths by cascading different low-rank adapters, and supports real-time switching between uniform quantization and mixed precision quantization without repeated optimization. To enhance accuracy and robustness, we introduce Multi-Bit Token Merging (MB-ToMe) to dynamically fuse token features across different bit-widths, improving robustness during bit-width switching. Additionally, we propose Multi-Bit Cascaded Low-Rank adapters (MB-CLoRA) to strengthen correlations between bit-width groups, further improve the overall performance of QuEPT. Extensive experiments demonstrate that QuEPT achieves comparable or better performance to existing state-of-the-art post-training quantization methods.Our code is available at https://github.com/xuke225/QuEPT

CRNov 10, 2015
ELDA: Towards Efficient and Lightweight Detection of Cache Pollution Attacks in NDN

Zhiwei Xu, Bo Chen, Ninghan Wang et al.

As a promising architectural design for future Internet, named data networking (NDN) relies on in-network caching to efficiently deliver name-based content. However, the in-network caching is vulnerable to cache pollution attacks (CPA), which can reduce cache hits by violating cache locality and significantly degrade the overall performance of NDN. To defend against CPA attacks, the most effective way is to first detect the attacks and then throttle them. Since the CPA attack itself has already imposed a huge burden on victims, to avoid exhausting the remaining resources on the victims for detection purpose, we expect a lightweight detection solution. We thus propose ELDA, an Efficient and Lightweight Detection scheme against cache pollution Attacks, in which we design a Lightweight Flajolet-Martin (LFM) sketch to monitor the interest traffic. Our analysis and simulations demonstrate that, by consuming a few computation and memory resources, ELDA can effectively and efficiently detect CPA attacks.