LGDec 6, 2024

Learning High-Degree Parities: The Crucial Role of the Initialization

arXiv:2412.04910v36 citationsh-index: 5
Originality Incremental advance
AI Analysis

It addresses a theoretical gap in understanding how initialization affects learning high-degree parities, which is incremental but relevant for neural network theory.

This paper investigates the learnability of almost-full parities using gradient descent on neural networks, showing that success depends on the initialization: Rademacher initialization enables efficient learning, while Gaussian perturbation with large standard deviation prevents it, with a positive result up to σ=O(d^{-1}).

Parities have become a standard benchmark for evaluating learning algorithms. Recent works show that regular neural networks trained by gradient descent can efficiently learn degree $k$ parities on uniform inputs for constant $k$, but fail to do so when $k$ and $d-k$ grow with $d$ (here $d$ is the ambient dimension). However, the case where $k=d-O_d(1)$ (almost-full parities), including the degree $d$ parity (the full parity), has remained unsettled. This paper shows that for gradient descent on regular neural networks, learnability depends on the initial weight distribution. On one hand, the discrete Rademacher initialization enables efficient learning of almost-full parities, while on the other hand, its Gaussian perturbation with large enough constant standard deviation $σ$ prevents it. The positive result for almost-full parities is shown to hold up to $σ=O(d^{-1})$, pointing to questions about a sharper threshold phenomenon. Unlike statistical query (SQ) learning, where a singleton function class like the full parity is trivially learnable, our negative result applies to a fixed function and relies on an initial gradient alignment measure of potential broader relevance to neural networks learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes