LG AIOct 17, 2025

Optimization of the quantization of dense neural networks from an exact QUBO formulation

Sergio Muñiz Subiñas, Manuel L. González, Jorge Ruiz Gómez, Alejandro Mata Ali, Jorge Martínez Martín, Miguel Franco Hernando, Ángel Miguel García-Vico

arXiv:2510.16075v17.11 citationsh-index: 8

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficiently compressing neural networks for deployment, though it appears incremental as it builds on existing quantization techniques with a new optimization approach.

The paper tackles the problem of post-training quantization for dense neural networks by introducing a novel ADAROUND-based QUBO formulation, achieving improved accuracy compared to traditional round-to-nearest methods across datasets like MNIST and CIFAR-10 at integer precisions from int8 to int1.

This work introduces a post-training quantization (PTQ) method for dense neural networks via a novel ADAROUND-based QUBO formulation. Using the Frobenius distance between the theoretical output and the dequantized output (before the activation function) as the objective, an explicit QUBO whose binary variables represent the rounding choice for each weight and bias is obtained. Additionally, by exploiting the structure of the coefficient QUBO matrix, the global problem can be exactly decomposed into $n$ independent subproblems of size $f+1$, which can be efficiently solved using some heuristics such as simulated annealing. The approach is evaluated on MNIST, Fashion-MNIST, EMNIST, and CIFAR-10 across integer precisions from int8 to int1 and compared with a round-to-nearest traditional quantization methodology.

View on arXiv PDF

Similar