Leonid Berlyand

h-index21

7papers

24citations

Novelty46%

AI Score32

Ranked #126,976 of 194,257 authors (top 65%)#27,959 in LG (top 70%)

7 Papers

8.8LGOct 4, 2023

Enhancing Accuracy in Deep Learning Using Random Matrix Theory

Leonid Berlyand, Etienne Sandier, Yitzchak Shmalo et al.

We explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs), focusing on layer pruning that is reducing the number of DNN parameters (weights). Our numerical results show that this pruning leads to a drastic reduction of parameters while not reducing the accuracy of DNNs and CNNs. Moreover, pruning the fully connected DNNs actually increases the accuracy and decreases the variance for random initializations. Our numerics indicate that this enhancement in accuracy is due to the simplification of the loss landscape. We next provide rigorous mathematical underpinning of these numerical results by proving the RMT-based Pruning Theorem. Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.

1.2MATH-PHJul 16, 2025

Asymptotic behavior of eigenvalues of large rank perturbations of large random matrices

Ievgenii Afanasiev, Leonid Berlyand, Mariia Kiyashko

The paper is concerned with deformed Wigner random matrices. These matrices are closely connected with Deep Neural Networks (DNNs): weight matrices of trained DNNs could be represented in the form $R + S$, where $R$ is random and $S$ is highly correlated. The spectrum of such matrices plays a key role in rigorous underpinning of the novel pruning technique based on Random Matrix Theory. Mathematics has been done only for finite-rank matrix $S$. However, in practice rank may grow. In this paper we develop asymptotic analysis for the case of growing rank.

9.4LGMar 2, 2025

Pruning Deep Neural Networks via a Combination of the Marchenko-Pastur Distribution and Regularization

Leonid Berlyand, Theo Bourdais, Houman Owhadi et al.

Deep neural networks (DNNs) have brought significant advancements in various applications in recent years, such as image recognition, speech recognition, and natural language processing. In particular, Vision Transformers (ViTs) have emerged as a powerful class of models in the field of deep learning for image classification. In this work, we propose a novel Random Matrix Theory (RMT)-based method for pruning pre-trained DNNs, based on the sparsification of weights and singular vectors, and apply it to ViTs. RMT provides a robust framework to analyze the statistical properties of large matrices, which has been shown to be crucial for understanding and optimizing the performance of DNNs. We demonstrate that our RMT-based pruning can be used to reduce the number of parameters of ViT models (trained on ImageNet) by 30-50\% with less than 1\% loss in accuracy. To our knowledge, this represents the state-of-the-art in pruning for these ViT models. Furthermore, we provide a rigorous mathematical underpinning of the above numerical studies, namely we proved a theorem for fully connected DNNs, and other more general DNN structures, describing how the randomness in the weight matrices of a DNN decreases as the weights approach a local or global minimum (during training). We verify this theorem through numerical experiments on fully connected DNNs, providing empirical support for our theoretical findings. Moreover, we prove a theorem that describes how DNN loss decreases as we remove randomness in the weight layers, and show a monotone dependence of the decrease in loss with the amount of randomness that we remove. Our results also provide significant RMT-based insights into the role of regularization during training and pruning.

4.1LGJan 7, 2025

Random weights of DNNs and emergence of fixed points

L. Berlyand, O. Krupchytskyi, V. Slavin

This paper is concerned with a special class of deep neural networks (DNNs) where the input and the output vectors have the same dimension. Such DNNs are widely used in applications, e.g., autoencoders. The training of such networks can be characterized by their fixed points (FPs). We are concerned with the dependence of the FPs number and their stability on the distribution of randomly initialized DNNs' weight matrices. Specifically, we consider the i.i.d. random weights with heavy and light-tail distributions. Our objectives are twofold. First, the dependence of FPs number and stability of FPs on the type of the distribution tail. Second, the dependence of the number of FPs on the DNNs' architecture. We perform extensive simulations and show that for light tails (e.g., Gaussian), which are typically used for initialization, a single stable FP exists for broad types of architectures. In contrast, for heavy tail distributions (e.g., Cauchy), which typically appear in trained DNNs, a number of FPs emerge. We further observe that these FPs are stable attractors and their basins of attraction partition the domain of input vectors. Finally, we observe an intriguing non-monotone dependence of the number of fixed points $Q(L)$ on the DNNs' depth $L$. The above results were first obtained for untrained DNNs with two types of distributions at initialization and then verified by considering DNNs in which the heavy tail distributions arise in training.

1.2NAJun 4, 2021

A novel multi-scale loss function for classification problems in machine learning

Leonid Berlyand, Robert Creese, Pierre-Emmanuel Jabin

We introduce two-scale loss functions for use in various gradient descent algorithms applied to classification problems via deep neural networks. This new method is generic in the sense that it can be applied to a wide range of machine learning architectures, from deep neural networks to support vector machines for example. These two-scale loss functions allow to focus the training onto objects in the training set which are not well classified. This leads to an increase in several measures of performance for appropriately-defined two-scale loss functions with respect to the more classical cross-entropy when tested on traditional deep neural networks on the MNIST, CIFAR10, and CIFAR100 data-sets.

5.1APFeb 10, 2020

Stability for the Training of Deep Neural Networks and Other Classifiers

Leonid Berlyand, Pierre-Emmanuel Jabin, C. Alex Safsten

We examine the stability of loss-minimizing training processes that are used for deep neural networks (DNN) and other classifiers. While a classifier is optimized during training through a so-called loss function, the performance of classifiers is usually evaluated by some measure of accuracy, such as the overall accuracy which quantifies the proportion of objects that are well classified. This leads to the guiding question of stability: does decreasing loss through training always result in increased accuracy? We formalize the notion of stability, and provide examples of instability. Our main result consists of two novel conditions on the classifier which, if either is satisfied, ensure stability of training, that is we derive tight bounds on accuracy as loss decreases. We also derive a sufficient condition for stability on the training set alone, identifying flat portions of the data manifold as potential sources of instability. The latter condition is explicitly verifiable on the training dataset. Our results do not depend on the algorithm used for training, as long as loss decreases with training.

1.2COMP-PHNov 23, 2014

Complexity reduction in many particles systems with random initial data

Leonid Berlyand, Pierre-Emmanuel Jabin, Mykhailo Potomkin

We consider the motion of interacting particles governed by a coupled system of ODEs with random initial conditions. Direct computations for such systems are prohibitively expensive due to a very large number of particles and randomness requiring many realizations in their locations in the presence of strong interactions. While there are several approaches that address the above difficulties, none addresses all three simultaneously. Our goal is to develop such a computational approach in order to capture the experimentally observed emergence of correlations in the collective state (patterns due to strong interactions). Our approach is based on the truncation of the BBGKY hierarchy that allows one to go beyond the classical Mean Field limit and capture correlations while drastically reducing the computational complexity. Finally, we provide an example showing a numerical solution of this nonlinear and non-local system.