LGFeb 17, 2023
Identifying Equivalent Training DynamicsWilliam T. Redman, Juan M. Bello-Rivas, Maria Fonoberova et al.
Study of the nonlinear evolution deep neural network (DNN) parameters undergo during training has uncovered regimes of distinct dynamical behavior. While a detailed understanding of these phenomena has the potential to advance improvements in training efficiency and robustness, the lack of methods for identifying when DNN models have equivalent dynamics limits the insight that can be gained from prior work. Topological conjugacy, a notion from dynamical systems theory, provides a precise definition of dynamical equivalence, offering a possible route to address this need. However, topological conjugacies have historically been challenging to compute. By leveraging advances in Koopman operator theory, we develop a framework for identifying conjugate and non-conjugate training dynamics. To validate our approach, we demonstrate that comparing Koopman eigenvalues can correctly identify a known equivalence between online mirror descent and online gradient descent. We then utilize our approach to: (a) identify non-conjugate training dynamics between shallow and wide fully connected neural networks; (b) characterize the early phase of training dynamics in convolutional neural networks; (c) uncover non-conjugate training dynamics in Transformers that do and do not undergo grokking. Our results, across a range of DNN architectures, illustrate the flexibility of our framework and highlight its potential for shedding new light on training dynamics.
DSNov 21, 2023
Koopman Learning with Episodic MemoryWilliam T. Redman, Dean Huang, Maria Fonoberova et al.
Koopman operator theory has found significant success in learning models of complex, real-world dynamical systems, enabling prediction and control. The greater interpretability and lower computational costs of these models, compared to traditional machine learning methodologies, make Koopman learning an especially appealing approach. Despite this, little work has been performed on endowing Koopman learning with the ability to leverage its own failures. To address this, we equip Koopman methods -- developed for predicting non-autonomous time-series -- with an episodic memory mechanism, enabling global recall of (or attention to) periods in time where similar dynamics previously occurred. We find that a basic implementation of Koopman learning with episodic memory leads to significant improvements in prediction on synthetic and real-world data. Our framework has considerable potential for expansion, allowing for future advances, and opens exciting new directions for Koopman learning.
ROSep 25, 2024
Koopman-driven grip force prediction through EMG sensingTomislav Bazina, Ervin Kamenar, Maria Fonoberova et al.
Loss of hand function due to conditions like stroke or multiple sclerosis significantly impacts daily activities. Robotic rehabilitation provides tools to restore hand function, while novel methods based on surface electromyography (sEMG) enable the adaptation of the device's force output according to the user's condition, thereby improving rehabilitation outcomes. This study aims to achieve accurate force estimations during medium wrap grasps using a single sEMG sensor pair, thereby addressing the challenge of escalating sensor requirements for precise predictions. We conducted sEMG measurements on 13 subjects at two forearm positions, validating results with a hand dynamometer. We established flexible signal-processing steps, yielding high peak cross-correlations between the processed sEMG signal (representing meaningful muscle activity) and grip force. Influential parameters were subsequently identified through sensitivity analysis. Leveraging a novel data-driven Koopman operator theory-based approach and problem-specific data lifting techniques, we devised a methodology for the estimation and short-term prediction of grip force from processed sEMG signals. A weighted mean absolute percentage error (wMAPE) of approx. 5.5% was achieved for the estimated grip force, whereas predictions with a 0.5-second prediction horizon resulted in a wMAPE of approx. 17.9%. The methodology proved robust regarding precise electrode positioning, as the effect of sensing position on error metrics was non-significant. The algorithm executes exceptionally fast, processing, estimating, and predicting a 0.5-second sEMG signal batch in just approx. 30 ms, facilitating real-time implementation.
LGOct 28, 2021
An Operator Theoretic View on Pruning Deep Neural NetworksWilliam T. Redman, Maria Fonoberova, Ryan Mohr et al.
The discovery of sparse subnetworks that are able to perform as well as full models has found broad applied and theoretical interest. While many pruning methods have been developed to this end, the naïve approach of removing parameters based on their magnitude has been found to be as robust as more complex, state-of-the-art algorithms. The lack of theory behind magnitude pruning's success, especially pre-convergence, and its relation to other pruning methods, such as gradient based pruning, are outstanding open questions in the field that are in need of being addressed. We make use of recent advances in dynamical systems theory, namely Koopman operator theory, to define a new class of theoretically motivated pruning algorithms. We show that these algorithms can be equivalent to magnitude and gradient based pruning, unifying these seemingly disparate methods, and find that they can be used to shed light on magnitude pruning's performance during the early part of training.
LGDec 21, 2020
Predicting the Critical Number of Layers for Hierarchical Support Vector RegressionRyan Mohr, Maria Fonoberova, Zlatko Drmač et al.
Hierarchical support vector regression (HSVR) models a function from data as a linear combination of SVR models at a range of scales, starting at a coarse scale and moving to finer scales as the hierarchy continues. In the original formulation of HSVR, there were no rules for choosing the depth of the model. In this paper, we observe in a number of models a phase transition in the training error -- the error remains relatively constant as layers are added, until a critical scale is passed, at which point the training error drops close to zero and remains nearly constant for added layers. We introduce a method to predict this critical scale a priori with the prediction based on the support of either a Fourier transform of the data or the Dynamic Mode Decomposition (DMD) spectrum. This allows us to determine the required number of layers prior to training any models.
LGJun 21, 2020
Applications of Koopman Mode Analysis to Neural NetworksIva Manojlović, Maria Fonoberova, Ryan Mohr et al.
We consider the training process of a neural network as a dynamical system acting on the high-dimensional weight space. Each epoch is an application of the map induced by the optimization algorithm and the loss function. Using this induced map, we can apply observables on the weight space and measure their evolution. The evolution of the observables are given by the Koopman operator associated with the induced dynamical system. We use the spectrum and modes of the Koopman operator to realize the above objectives. Our methods can help to, a priori, determine the network depth; determine if we have a bad initialization of the network weights, allowing a restart before training too long; speeding up the training time. Additionally, our methods help enable noise rejection and improve robustness. We show how the Koopman spectrum can be used to determine the number of layers required for the architecture. Additionally, we show how we can elucidate the convergence versus non-convergence of the training process by monitoring the spectrum, in particular, how the existence of eigenvalues clustering around 1 determines when to terminate the learning process. We also show how using Koopman modes we can selectively prune the network to speed up the training procedure. Finally, we show that incorporating loss functions based on negative Sobolev norms can allow for the reconstruction of a multi-scale signal polluted by very large amounts of noise.