LGSTDec 25, 2024

Towards a Statistical Understanding of Neural Networks: Beyond the Neural Tangent Kernel Theories

arXiv:2412.18756v1h-index: 49
Originality Highly original
AI Analysis

This work addresses a foundational problem in machine learning theory for researchers seeking to understand neural networks' feature learning and generalization beyond existing kernel-based theories.

The paper tackles the challenge of theoretically analyzing neural networks' feature learning and generalization by proposing a new paradigm that moves beyond fixed kernel theories like the neural tangent kernel, introducing an over-parameterized Gaussian sequence model as a prototype to study these characteristics.

A primary advantage of neural networks lies in their feature learning characteristics, which is challenging to theoretically analyze due to the complexity of their training dynamics. We propose a new paradigm for studying feature learning and the resulting benefits in generalizability. After reviewing the neural tangent kernel (NTK) theory and recent results in kernel regression, which address the generalization issue of sufficiently wide neural networks, we examine limitations and implications of the fixed kernel theory (as the NTK theory) and review recent theoretical advancements in feature learning. Moving beyond the fixed kernel/feature theory, we consider neural networks as adaptive feature models. Finally, we propose an over-parameterized Gaussian sequence model as a prototype model to study the feature learning characteristics of neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes