CVLGMar 11, 2021

Fast and Accurate Model Scaling

arXiv:2103.06877v1111 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficiently scaling neural networks for practitioners and researchers, offering a framework to optimize performance under computational constraints, though it is incremental in refining existing scaling methods.

The paper tackles the problem of scaling convolutional neural networks by analyzing how different scaling strategies affect model parameters, activations, and runtime, and proposes a fast compound scaling strategy that reduces activation growth to O(√s) while maintaining accuracy, leading to speedups on memory-limited hardware.

In this work we analyze strategies for convolutional neural network scaling; that is, the process of scaling a base convolutional network to endow it with greater computational complexity and consequently representational power. Example scaling strategies may include increasing model width, depth, resolution, etc. While various scaling strategies exist, their tradeoffs are not fully understood. Existing analysis typically focuses on the interplay of accuracy and flops (floating point operations). Yet, as we demonstrate, various scaling strategies affect model parameters, activations, and consequently actual runtime quite differently. In our experiments we show the surprising result that numerous scaling strategies yield networks with similar accuracy but with widely varying properties. This leads us to propose a simple fast compound scaling strategy that encourages primarily scaling model width, while scaling depth and resolution to a lesser extent. Unlike currently popular scaling strategies, which result in about $O(s)$ increase in model activation w.r.t. scaling flops by a factor of $s$, the proposed fast compound scaling results in close to $O(\sqrt{s})$ increase in activations, while achieving excellent accuracy. This leads to comparable speedups on modern memory-limited hardware (e.g., GPU, TPU). More generally, we hope this work provides a framework for analyzing and selecting scaling strategies under various computational constraints.

Code Implementations6 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes