LGCVJul 13, 2025

MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression

arXiv:2507.09616v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient transformer deployment for edge computing, offering a novel compression method that is incremental in combining existing techniques.

The paper tackles the challenge of deploying transformer-based neural networks on resource-constrained edge devices by introducing MLoRQ, a method that integrates low-rank approximation and mixed-precision quantization, resulting in up to 15% performance improvement on Vision Transformers for tasks like image classification, object detection, and instance segmentation.

Deploying transformer-based neural networks on resource-constrained edge devices presents a significant challenge. This challenge is often addressed through various techniques, such as low-rank approximation and mixed-precision quantization. In this work, we introduce Mixed Low-Rank and Quantization (MLoRQ), a novel method that integrates both techniques. MLoRQ employs a two-stage optimization process to determine optimal bit-width and rank assignments for each layer, adhering to predefined memory constraints. This process includes: (i) an intra-layer optimization that identifies potentially optimal compression solutions out of all low-rank and quantization combinations; (ii) an inter-layer optimization that assigns bit-width precision and rank to each layer while ensuring the memory constraint is met. An optional final step applies a sequential optimization process using a modified adaptive rounding technique to mitigate compression-induced errors in joint low-rank approximation and quantization. The method is compatible and can be seamlessly integrated with most existing quantization algorithms. MLoRQ shows state-of-the-art results with up to 15\% performance improvement, evaluated on Vision Transformers for image classification, object detection, and instance segmentation tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes