LGFeb 23, 2024

Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

arXiv:2402.15175v226 citationsh-index: 41
Originality Incremental advance
AI Analysis

This work addresses the challenge of understanding complex phenomena in neural models for researchers in machine learning, though it is incremental as it builds on existing circuit-based explanations.

The paper tackles the problem of explaining grokking, double descent, and emergent abilities in deep learning by proposing a unified framework based on competition between memorization and generalization circuits, and it provides experimental validation for predictions about double descent and extends the framework to multi-task learning to understand emergent abilities.

Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models, which challenge human intuition and are crucial for a deeper understanding of neural models. In this paper, we present a comprehensive framework that provides a unified view of these three phenomena, focusing on the competition between memorization and generalization circuits. This approach, initially employed to explain grokking, is extended in our work to encompass a wider range of model sizes and training data volumes. Our framework delineates four distinct training dynamics, each depending on varying combinations of model size and training data quantity. Utilizing this framework, we provide a detailed analysis of the double descent phenomenon and propose two verifiable predictions regarding its occurrence, both substantiated by our experimental results. Moreover, we expand our framework to the multi-task learning paradigm, demonstrating how algorithm tasks can be turned into emergent abilities. This offers a novel perspective to understand emergent abilities in Large Language Models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes