LGAINov 10, 2025

QUARK: Quantization-Enabled Circuit Sharing for Transformer Acceleration by Exploiting Common Patterns in Nonlinear Operations

arXiv:2511.06767v16 citationsh-index: 142025 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient hardware acceleration for Transformer-based models in CV and NLP, offering a domain-specific improvement.

The paper tackles the problem of high inference latency from nonlinear operations in Transformer models by proposing QUARK, a quantization-enabled FPGA acceleration framework that exploits common patterns for circuit sharing, achieving up to 1.96x speedup over GPUs and reducing hardware overhead by over 50% while maintaining or boosting accuracy.

Transformer-based models have revolutionized computer vision (CV) and natural language processing (NLP) by achieving state-of-the-art performance across a range of benchmarks. However, nonlinear operations in models significantly contribute to inference latency, presenting unique challenges for efficient hardware acceleration. To this end, we propose QUARK, a quantization-enabled FPGA acceleration framework that leverages common patterns in nonlinear operations to enable efficient circuit sharing, thereby reducing hardware resource requirements. QUARK targets all nonlinear operations within Transformer-based models, achieving high-performance approximation through a novel circuit-sharing design tailored to accelerate these operations. Our evaluation demonstrates that QUARK significantly reduces the computational overhead of nonlinear operators in mainstream Transformer architectures, achieving up to a 1.96 times end-to-end speedup over GPU implementations. Moreover, QUARK lowers the hardware overhead of nonlinear modules by more than 50% compared to prior approaches, all while maintaining high model accuracy -- and even substantially boosting accuracy under ultra-low-bit quantization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes