\emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike \emph{sign} perceptrons neural networks

arXiv:2312.08257v12 citationsh-index: 22
Originality Incremental advance
AI Analysis

This provides incremental progress in theoretical machine learning by offering improved rigorous capacity bounds for specific neural network architectures, relevant to researchers in computational learning theory.

The paper tackles the problem of characterizing the memorization capacity of sign perceptrons neural networks, specifically for treelike committee machines, and shows that a partially lifted Random Duality Theory variant yields closed-form analytical bounds that universally improve over the best known scaling for any number of hidden neurons.

We consider the memorization capabilities of multilayered \emph{sign} perceptrons neural networks (SPNNs). A recent rigorous upper-bounding capacity characterization, obtained in \cite{Stojnictcmspnncaprdt23} utilizing the Random Duality Theory (RDT), demonstrated that adding neurons in a network configuration may indeed be very beneficial. Moreover, for particular \emph{treelike committee machines} (TCM) architectures with $d\leq 5$ neurons in the hidden layer, \cite{Stojnictcmspnncaprdt23} made a very first mathematically rigorous progress in over 30 years by lowering the previously best known capacity bounds of \cite{MitchDurb89}. Here, we first establish that the RDT bounds from \cite{Stojnictcmspnncaprdt23} scale as $\sim \sqrt{d}$ and can not on their own \emph{universally} (over the entire range of $d$) beat the best known $\sim \log(d)$ scaling of the bounds from \cite{MitchDurb89}. After recognizing that the progress from \cite{Stojnictcmspnncaprdt23} is therefore promising, but yet without a complete concretization, we then proceed by considering the recently developed fully lifted RDT (fl RDT) as an alternative. While the fl RDT is indeed a powerful juggernaut, it typically relies on heavy numerical evaluations. To avoid such heavy numerics, we here focus on a simplified, \emph{partially lifted}, variant and show that it allows for very neat, closed form, analytical capacity characterizations. Moreover, we obtain the concrete capacity bounds that \emph{universally} improve for \emph{any} $d$ over the best known ones of \cite{MitchDurb89}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes