B. Mehlig

LG
h-index5
6papers
73citations
Novelty31%
AI Score31

6 Papers

DIS-NNJun 21, 2023
Finite-time Lyapunov exponents of deep neural networks

L. Storm, H. Linander, J. Bec et al.

We compute how small input perturbations affect the output of deep neural networks, exploring an analogy between deep networks and dynamical systems, where the growth or decay of local perturbations is characterised by finite-time Lyapunov exponents. We show that the maximal exponent forms geometrical structures in input space, akin to coherent structures in dynamical systems. Ridges of large positive exponents divide input space into different regions that the network associates with different classes. These ridges visualise the geometry that deep networks construct in input space, shedding light on the fundamental mechanisms underlying their learning capabilities.

LGJun 3, 2022
Constraints on parameter choices for successful reservoir computing

L. Storm, K. Gustavsson, B. Mehlig

Echo-state networks are simple models of discrete dynamical systems driven by a time series. By selecting network parameters such that the dynamics of the network is contractive, characterized by a negative maximal Lyapunov exponent, the network may synchronize with the driving signal. Exploiting this synchronization, the echo-state network may be trained to autonomously reproduce the input dynamics, enabling time-series prediction. However, while synchronization is a necessary condition for prediction, it is not sufficient. Here, we study what other conditions are necessary for successful time-series prediction. We identify two key parameters for prediction performance, and conduct a parameter sweep to find regions where prediction is successful. These regions differ significantly depending on whether full or partial phase space information about the input is provided to the network during training. We explain how these regions emerge.

LGNov 26, 2022
Looking at the posterior: accuracy and uncertainty of neural-network predictions

H. Linander, O. Balabanov, H. Yang et al.

Bayesian inference can quantify uncertainty in the predictions of neural networks using posterior distributions for model parameters and network output. By looking at these posterior distributions, one can separate the origin of uncertainty into aleatoric and epistemic contributions. One goal of uncertainty quantification is to inform on prediction accuracy. Here we show that prediction accuracy depends on both epistemic and aleatoric uncertainty in an intricate fashion that cannot be understood in terms of marginalized uncertainty distributions alone. How the accuracy relates to epistemic and aleatoric uncertainties depends not only on the model architecture, but also on the properties of the dataset. We discuss the significance of these results for active learning and introduce a novel acquisition function that outperforms common uncertainty-based methods. To arrive at our results, we approximated the posteriors using deep ensembles, for fully-connected, convolutional and attention-based neural networks.

FLU-DYNOct 10, 2025
Smart navigation of a gravity-driven glider with adjustable centre-of-mass

X. Jiang, J. Qiu, K. Gustavsson et al.

Artificial gliders are designed to disperse as they settle through a fluid, requiring precise navigation to reach target locations. We show that a compact glider settling in a viscous fluid can navigate by dynamically adjusting its centre-of-mass. Using fully resolved direct numerical simulations (DNS) and reinforcement learning, we find two optimal navigation strategies that allow the glider to reach its target location accurately. These strategies depend sensitively on how the glider interacts with the surrounding fluid. The nature of this interaction changes as the particle Reynolds number Re$_p$ changes. Our results explain how the optimal strategy depends on Re$_p$. At large Re$_p$, the glider learns to tumble rapidly by moving its centre-of-mass as its orientation changes. This generates a large horizontal inertial lift force, which allows the glider to travel far. At small Re$_p$, by contrast, high viscosity hinders tumbling. In this case, the glider learns to adjust its centre-of-mass so that it settles with a steady, inclined orientation that results in a horizontal viscous force. The horizontal range is much smaller than for large Re$_p$, because this viscous force is much smaller than the inertial lift force at large Re$_p$. *These authors contributed equally.

LGNov 29, 2021
Improving traffic sign recognition by active search

S. Jaghouar, H. Gustafsson, B. Mehlig et al.

We describe an iterative active-learning algorithm to recognise rare traffic signs. A standard ResNet is trained on a training set containing only a single sample of the rare class. We demonstrate that by sorting the samples of a large, unlabeled set by the estimated probability of belonging to the rare class, we can efficiently identify samples from the rare class. This works despite the fact that this estimated probability is usually quite low. A reliable active-learning loop is obtained by labeling these candidate samples, including them in the training set, and iterating the procedure. Further, we show that we get similar results starting from a single synthetic sample. Our results are important as they indicate a straightforward way of improving traffic-sign recognition for automated driving systems. In addition, they show that we can make use of the information hidden in low confidence outputs, which is usually ignored.

LGJan 17, 2019
Machine learning with neural networks

B. Mehlig

These are lecture notes for a course on machine learning with neural networks for scientists and engineers that I have given at Gothenburg University and Chalmers Technical University in Gothenburg, Sweden. The material is organised into three parts: Hopfield networks, supervised learning of labeled data, and learning algorithms for unlabeled data sets. Part I introduces stochastic recurrent networks: Hopfield networks and Boltzmann machines. The analysis of their learning rules sets the scene for the later parts. Part II describes supervised learning with multilayer perceptrons and convolutional neural networks. This part starts with a simple geometrical interpretation of the learning rule and leads to the recent successes of convolutional networks in object recognition, recurrent networks in language processing, and reservoir computers in time-series analysis. Part III explains what neural networks can learn about data that is not labeled. This part begins with a description of unsupervised learning techniques for clustering of data, non-linear projections, and embeddings. A section on autoencoders explains how to learn without labels using convolutional networks, and the last chapter is dedicated to reinforcement learning. The overall goal of the course is to explain the fundamental principles that allow neural networks to learn, emphasising ideas and concepts that are common to all three parts. The present version does not contain exercises (copyright owned by Cambridge University Press). The complete book is available at https://www.cambridge.org/gb/academic/subjects/physics/statistical-physics/machine-learning-neural-networks-introduction-scientists-and-engineers?format=HB.