CLLGFeb 17, 2025

How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines

arXiv:2502.12051v36 citationsh-index: 7Trans. Mach. Learn. Res.
Originality Synthesis-oriented
AI Analysis

It addresses the problem of applying scaling laws effectively for researchers and practitioners in AI, but it is incremental as it reviews existing work without introducing new methods.

This survey synthesizes insights from over 50 studies on neural scaling laws, highlighting their limitations across architectures, modalities, and domains, and advocates for adaptive scaling strategies tailored to real-world applications.

Neural scaling laws have revolutionized the design and optimization of large-scale AI models by revealing predictable relationships between model size, dataset volume, and computational resources. Early research established power-law relationships in model performance, leading to compute-optimal scaling strategies. However, recent studies highlighted their limitations across architectures, modalities, and deployment contexts. Sparse models, mixture-of-experts, retrieval-augmented learning, and multimodal models often deviate from traditional scaling patterns. Moreover, scaling behaviors vary across domains such as vision, reinforcement learning, and fine-tuning, underscoring the need for more nuanced approaches. In this survey, we synthesize insights from over 50 studies, examining the theoretical foundations, empirical findings, and practical implications of scaling laws. We also explore key challenges, including data efficiency, inference scaling, and architecture-specific constraints, advocating for adaptive scaling strategies tailored to real-world applications. We suggest that while scaling laws provide a useful guide, they do not always generalize across all architectures and training strategies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes