OC LGDec 7, 2022

Multi-Objective Linear Ensembles for Robust and Sparse Training of Few-Bit Neural Networks

Ambrogio Maria Bernardelli, Stefano Gualandi, Hoong Chuin Lau, Simone Milanesi, Neil Yorke-Smith

arXiv:2212.03659v24.02 citationsh-index: 37Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of lightweight and efficient neural network training for low-power devices, offering an incremental improvement over existing solver-based methods.

The paper tackled the problem of training few-bit neural networks (BNNs and INNs) in low-data settings by proposing a multi-objective ensemble approach (BeMi) that trains robust and sparsified networks, achieving average accuracies of 68.4% with 10 images per class and 81.8% with 40 images per class on MNIST, while removing up to 75.3% of network links.

Training neural networks (NNs) using combinatorial optimization solvers has gained attention in recent years. In low-data settings, state-of-the-art mixed integer linear programming solvers can train exactly a NN, avoiding intensive GPU-based training and hyper-parameter tuning and simultaneously training and sparsifying the network. We study the case of few-bit discrete-valued neural networks, both Binarized Neural Networks (BNNs), whose values are restricted to +-1, and Integer Neural Networks (INNs), whose values lie in a range {-P, ..., P}. Few-bit NNs receive increasing recognition due to their lightweight architecture and ability to run on low-power devices. This paper proposes new methods to improve the training of BNNs and INNs. Our contribution is a multi-objective ensemble approach based on training a single NN for each possible pair of classes and applying a majority voting scheme to predict the final output. Our approach results in training robust sparsified networks whose output is not affected by small perturbations on the input and whose number of active weights is as small as possible. We compare this BeMi approach to the current state-of-the-art in solver-based NN training and gradient-based training, focusing on BNN learning in few-shot contexts. We compare the benefits and drawbacks of INNs versus BNNs, bringing new light to the distribution of weights over the {-P, ..., P} interval. Finally, we compare multi-objective versus single-objective training of INNs, showing that robustness and network simplicity can be acquired simultaneously, thus obtaining better test performances. While the previous state-of-the-art approaches achieve an average accuracy of 51.1% on the MNIST dataset, the BeMi ensemble approach achieves an average accuracy of 68.4% when trained with 10 images per class and 81.8% when trained with 40 images per class, having up to 75.3% NN links removed.

View on arXiv PDF Code

Similar