LGAICVJun 28, 2022

RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

arXiv:2206.14098v215 citationsh-index: 25
Originality Highly original
AI Analysis

This addresses memory constraints for training large-scale networks in computer vision, enabling more efficient model scaling.

This paper tackles the high memory requirements of training bidirectional multi-scale feature fusion networks by introducing RevSilo, a reversible module that eliminates the need to store hidden activations, and RevBiFPN, a fully reversible network. The result is competitive performance with up to 19.8x lower training memory for image classification and up to a 2.5% AP boost on MS COCO with reduced memory and MACs.

This work introduces RevSilo, the first reversible bidirectional multi-scale feature fusion module. Like other reversible methods, RevSilo eliminates the need to store hidden activations by recomputing them. However, existing reversible methods do not apply to multi-scale feature fusion and are, therefore, not applicable to a large class of networks. Bidirectional multi-scale feature fusion promotes local and global coherence and has become a de facto design principle for networks targeting spatially sensitive tasks, e.g., HRNet (Sun et al., 2019a) and EfficientDet (Tan et al., 2020). These networks achieve state-of-the-art results across various computer vision tasks when paired with high-resolution inputs. However, training them requires substantial accelerator memory for saving large, multi-resolution activations. These memory requirements inherently cap the size of neural networks, limiting improvements that come from scale. Operating across resolution scales, RevSilo alleviates these issues. Stacking RevSilos, we create RevBiFPN, a fully reversible bidirectional feature pyramid network. RevBiFPN is competitive with networks such as EfficientNet while using up to 19.8x lesser training memory for image classification. When fine-tuned on MS COCO, RevBiFPN provides up to a 2.5% boost in AP over HRNet using fewer MACs and a 2.4x reduction in training-time memory.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes