LGHEP-EXMar 27

PQuantML: A Tool for End-to-End Hardware-aware Model Compression

arXiv:2603.2659560.2h-index: 6Has Code
Predicted impact top 37% in LG · last 90 daysOriginality Synthesis-oriented
AI Analysis

This work addresses the need for efficient model deployment in real-time edge computing, such as LHC data processing, but is incremental as it builds on existing pruning and quantization methods.

The paper tackles the problem of deploying performant neural networks in latency-constrained environments by introducing PQuantML, an open-source, hardware-aware compression library that achieves substantial parameter and bit-width reductions while maintaining accuracy on tasks like jet tagging.

PQuantML is a new open-source, hardware-aware neural network model compression library tailored to end-to-end workflows. Motivated by the need to deploy performant models to environments with strict latency constraints, PQuantML simplifies training of compressed models by providing a unified interface to apply pruning and quantization, either jointly or individually. The library implements multiple pruning methods with different granularities, as well as fixed-point quantization with support for High-Granularity Quantization. We evaluate PQuantML on representative tasks such as the jet substructure classification, so-called jet tagging, an on-edge problem related to real-time LHC data processing. Using various pruning methods with fixed-point quantization, PQuantML achieves substantial parameter and bit-width reductions while maintaining accuracy. The resulting compression is further compared against existing tools, such as QKeras and HGQ.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes