Joseph Najnudel

MATH-PH

h-index12

3papers

47citations

Novelty40%

AI Score28

Ranked #149,737 of 194,257 authors (top 77%)#5 in MATH-PH (top 38%)

3 Papers

5.1MATH-PHMay 17, 2022

Universal characteristics of deep neural network loss surfaces from random matrix theory

Nicholas P Baskerville, Jonathan P Keating, Francesco Mezzadri et al.

This paper considers several aspects of random matrix universality in deep neural networks. Motivated by recent experimental work, we use universal properties of random matrices related to local statistics to derive practical implications for deep neural networks based on a realistic model of their Hessians. In particular we derive universal aspects of outliers in the spectra of deep neural networks and demonstrate the important role of random matrix local laws in popular pre-conditioning gradient descent algorithms. We also present insights into deep neural network loss surfaces from quite general arguments based on tools from statistical physics and random matrix theory.

5.1MATH-PHJan 7, 2021Code

A spin-glass model for the loss surfaces of generative adversarial networks

Nicholas P Baskerville, Jonathan P Keating, Francesco Mezzadri et al.

We present a novel mathematical model that seeks to capture the key design feature of generative adversarial networks (GANs). Our model consists of two interacting spin glasses, and we conduct an extensive theoretical analysis of the complexity of the model's critical points using techniques from Random Matrix Theory. The result is insights into the loss surfaces of large GANs that build upon prior insights for simpler networks, but also reveal new structure unique to this setting.

8.0PRApr 8, 2020Code

The Loss Surfaces of Neural Networks with General Activation Functions

Nicholas P. Baskerville, Jonathan P. Keating, Francesco Mezzadri et al.

The loss surfaces of deep neural networks have been the subject of several studies, theoretical and experimental, over the last few years. One strand of work considers the complexity, in the sense of local optima, of high dimensional random functions with the aim of informing how local optimisation methods may perform in such complicated settings. Prior work of Choromanska et al (2015) established a direct link between the training loss surfaces of deep multi-layer perceptron networks and spherical multi-spin glass models under some very strong assumptions on the network and its data. In this work, we test the validity of this approach by removing the undesirable restriction to ReLU activation functions. In doing so, we chart a new path through the spin glass complexity calculations using supersymmetric methods in Random Matrix Theory which may prove useful in other contexts. Our results shed new light on both the strengths and the weaknesses of spin glass models in this context.