LG AI IT DSDec 31, 2020

Loss Barcode: A Topological Measure of Escapability in Loss Landscapes

Serguei Barannikov, Daria Voronkova, Alexander Mironenko, Ilya Trofimov, Alexander Korotin, Grigorii Sotnikov, Evgeny Burnaev

arXiv:2012.15834v35.87 citations

Originality Highly original

AI Analysis

This work provides new topological insights into the learning process and generalization properties of deep neural networks, which is important for researchers studying optimization and generalization in deep learning.

This paper introduces the Topological Obstructions score (TO-score), derived from loss barcodes, to quantify the escapability of local minima in neural network loss landscapes. They observe that the TO-score decreases with increasing network depth and width, indicating fewer topological obstructions to learning, and find a connection between the length of minima segments in the loss barcode and generalization errors.

Neural network training is commonly based on SGD. However, the understanding of SGD's ability to converge to good local minima, given the non-convex nature of loss functions and the intricate geometric characteristics of loss landscapes, remains limited. In this paper, we apply topological data analysis methods to loss landscapes to gain insights into the learning process and generalization properties of deep neural networks. We use the loss function topology to relate the local behavior of gradient descent trajectories with the global properties of the loss surface. For this purpose, we define the neural network's Topological Obstructions score ("TO-score") with the help of robust topological invariants, barcodes of the loss function, which quantify the escapability of local minima for gradient-based optimization. Our two principal observations are: 1) the loss barcode of the neural network decreases with increasing depth and width, therefore the topological obstructions to learning diminish; 2) in certain situations there is a connection between the length of minima segments in the loss barcode and the minima's generalization errors. Our statements are based on extensive experiments with fully connected, convolutional, and transformer architectures and several datasets including MNIST, FMNIST, CIFAR10, CIFAR100, SVHN, and multilingual OSCAR text dataset.

View on arXiv PDF

Similar