LGOct 2, 2023

RRR-Net: Reusing, Reducing, and Recycling a Deep Backbone Network

Haozhe Sun, Isabelle Guyon, Felix Mohr, Hedi Tabia

arXiv:2310.01157v12.03 citationsh-index: 70

Originality Incremental advance

AI Analysis

This addresses the need for compact and efficient models in computer vision, offering a practical solution for deployment in resource-constrained environments, though it is incremental as it builds on existing backbone reuse techniques.

The paper tackles the problem of reducing the size and inference latency of large pre-trained backbone networks like ResNet152 while maintaining performance. It achieves this by reducing the network from 51 to 5 blocks, cutting parameters and FLOPs by over 6 times without significant degradation, and then splitting it into branches to create an ensemble that matches or exceeds classical fine-tuning on 40 image classification datasets.

It has become mainstream in computer vision and other machine learning domains to reuse backbone networks pre-trained on large datasets as preprocessors. Typically, the last layer is replaced by a shallow learning machine of sorts; the newly-added classification head and (optionally) deeper layers are fine-tuned on a new task. Due to its strong performance and simplicity, a common pre-trained backbone network is ResNet152.However, ResNet152 is relatively large and induces inference latency. In many cases, a compact and efficient backbone with similar performance would be preferable over a larger, slower one. This paper investigates techniques to reuse a pre-trained backbone with the objective of creating a smaller and faster model. Starting from a large ResNet152 backbone pre-trained on ImageNet, we first reduce it from 51 blocks to 5 blocks, reducing its number of parameters and FLOPs by more than 6 times, without significant performance degradation. Then, we split the model after 3 blocks into several branches, while preserving the same number of parameters and FLOPs, to create an ensemble of sub-networks to improve performance. Our experiments on a large benchmark of $40$ image classification datasets from various domains suggest that our techniques match the performance (if not better) of ``classical backbone fine-tuning'' while achieving a smaller model size and faster inference speed.

View on arXiv PDF

Similar