William T. Redman

h-index8

8papers

112citations

Novelty59%

AI Score33

Ranked #116,605 of 194,257 authors (top 60%)#25,628 in LG (top 64%)

8 Papers

9.8LGFeb 17, 2023Code

Identifying Equivalent Training Dynamics

William T. Redman, Juan M. Bello-Rivas, Maria Fonoberova et al.

Study of the nonlinear evolution deep neural network (DNN) parameters undergo during training has uncovered regimes of distinct dynamical behavior. While a detailed understanding of these phenomena has the potential to advance improvements in training efficiency and robustness, the lack of methods for identifying when DNN models have equivalent dynamics limits the insight that can be gained from prior work. Topological conjugacy, a notion from dynamical systems theory, provides a precise definition of dynamical equivalence, offering a possible route to address this need. However, topological conjugacies have historically been challenging to compute. By leveraging advances in Koopman operator theory, we develop a framework for identifying conjugate and non-conjugate training dynamics. To validate our approach, we demonstrate that comparing Koopman eigenvalues can correctly identify a known equivalence between online mirror descent and online gradient descent. We then utilize our approach to: (a) identify non-conjugate training dynamics between shallow and wide fully connected neural networks; (b) characterize the early phase of training dynamics in convolutional neural networks; (c) uncover non-conjugate training dynamics in Transformers that do and do not undergo grokking. Our results, across a range of DNN architectures, illustrate the flexibility of our framework and highlight its potential for shedding new light on training dynamics.

2.3DSNov 21, 2023

Koopman Learning with Episodic Memory

William T. Redman, Dean Huang, Maria Fonoberova et al.

Koopman operator theory has found significant success in learning models of complex, real-world dynamical systems, enabling prediction and control. The greater interpretability and lower computational costs of these models, compared to traditional machine learning methodologies, make Koopman learning an especially appealing approach. Despite this, little work has been performed on endowing Koopman learning with the ability to leverage its own failures. To address this, we equip Koopman methods -- developed for predicting non-autonomous time-series -- with an episodic memory mechanism, enabling global recall of (or attention to) periods in time where similar dynamics previously occurred. We find that a basic implementation of Koopman learning with episodic memory leads to significant improvements in prediction on synthetic and real-world data. Our framework has considerable potential for expansion, allowing for future advances, and opens exciting new directions for Koopman learning.

4.6LGDec 9, 2024

On How Iterative Magnitude Pruning Discovers Local Receptive Fields in Fully Connected Neural Networks

William T. Redman, Zhangyang Wang, Alessandro Ingrosso et al.

Since its use in the Lottery Ticket Hypothesis, iterative magnitude pruning (IMP) has become a popular method for extracting sparse subnetworks that can be trained to high performance. Despite its success, the mechanism that drives the success of IMP remains unclear. One possibility is that IMP is capable of extracting subnetworks with good inductive biases that facilitate performance. Supporting this idea, recent work showed that applying IMP to fully connected neural networks (FCNs) leads to the emergence of local receptive fields (RFs), a feature of mammalian visual cortex and convolutional neural networks that facilitates image processing. However, it remains unclear why IMP would uncover localized features in the first place. Inspired by results showing that training on synthetic images with highly non-Gaussian statistics (e.g., sharp edges) is sufficient to drive the emergence of local RFs in FCNs, we hypothesize that IMP iteratively increases the non-Gaussian statistics of FCN representations, creating a feedback loop that enhances localization. Here, we demonstrate first that non-Gaussian input statistics are indeed necessary for IMP to discover localized RFs. We then develop a new method for measuring the effect of individual weights on the statistics of the FCN representations ("cavity method"), which allows us to show that IMP systematically increases the non-Gaussianity of pre-activations, leading to the formation of localized RFs. Our work, which is the first to study the effect of IMP on the statistics of the representations of neural networks, sheds parsimonious light on one way in which IMP can drive the formation of strong inductive biases.

3.8LGMay 15, 2023

Learning Linear Embeddings for Non-Linear Network Dynamics with Koopman Message Passing

King Fai Yeh, Paris Flood, William Redman et al.

Recently, Koopman operator theory has become a powerful tool for developing linear representations of non-linear dynamical systems. However, existing data-driven applications of Koopman operator theory, including both traditional and deep learning approaches, perform poorly on non-linear network dynamics problems as they do not address the underlying geometric structure. In this paper we present a novel approach based on Koopman operator theory and message passing networks that finds a linear representation for the dynamical system which is globally valid at any time step. The linearisations found by our method produce predictions on a suite of network dynamics problems that are several orders of magnitude better than current state-of-the-art techniques. We also apply our approach to the highly non-linear training dynamics of neural network architectures, and obtain linear representations which can generate network parameters with comparable performance to networks trained by classical optimisers.

13.1LGOct 28, 2021

An Operator Theoretic View on Pruning Deep Neural Networks

William T. Redman, Maria Fonoberova, Ryan Mohr et al.

The discovery of sparse subnetworks that are able to perform as well as full models has found broad applied and theoretical interest. While many pruning methods have been developed to this end, the naïve approach of removing parameters based on their magnitude has been found to be as robust as more complex, state-of-the-art algorithms. The lack of theory behind magnitude pruning's success, especially pre-convergence, and its relation to other pruning methods, such as gradient based pruning, are outstanding open questions in the field that are in need of being addressed. We make use of recent advances in dynamical systems theory, namely Koopman operator theory, to define a new class of theoretically motivated pruning algorithms. We show that these algorithms can be equivalent to magnitude and gradient based pruning, unifying these seemingly disparate methods, and find that they can be used to shed light on magnitude pruning's performance during the early part of training.

9.9LGOct 7, 2021

Universality of Winning Tickets: A Renormalization Group Perspective

William T. Redman, Tianlong Chen, Zhangyang Wang et al.

Foundational work on the Lottery Ticket Hypothesis has suggested an exciting corollary: winning tickets found in the context of one task can be transferred to similar tasks, possibly even across different architectures. This has generated broad interest, but methods to study this universality are lacking. We make use of renormalization group theory, a powerful tool from theoretical physics, to address this need. We find that iterative magnitude pruning, the principal algorithm used for discovering winning tickets, is a renormalization group scheme, and can be viewed as inducing a flow in parameter space. We demonstrate that ResNet-50 models with transferable winning tickets have flows with common properties, as would be expected from the theory. Similar observations are made for BERT models, with evidence that their flows are near fixed points. Additionally, we leverage our framework to study winning tickets transferred across ResNet architectures, observing that smaller models have flows with more uniform properties than larger models, complicating transfer between them.

5.0LGAug 24, 2020

Local error quantification for Neural Network Differential Equation solvers

Akshunna S. Dogra, William T Redman

Neural networks have been identified as powerful tools for the study of complex systems. A noteworthy example is the neural network differential equation (NN DE) solver, which can provide functional approximations to the solutions of a wide variety of differential equations. Such solvers produce robust functional expressions, are well suited for further manipulations on the quantities of interest (for example, taking derivatives), and capable of leveraging the modern advances in parallelization and computing power. However, there is a lack of work on the role precise error quantification can play in their predictions: usually, the focus is on ambiguous and/or global measures of performance like the loss function and/or obtaining global bounds on the errors associated with the predictions. Precise, local error quantification is seldom possible without external means or outright knowledge of the true solution. We address these concerns in the context of dynamical system NN DE solvers, leveraging learnt information within the NN DE solvers to develop methods that allow them to be more accurate and efficient, while still pursuing an unsupervised approach that does not rely on external tools or data. We achieve this via methods that can precisely estimate NN DE solver prediction errors point-wise, thus allowing the user the capacity for efficient and targeted error correction. We exemplify the utility of our methods by testing them on a nonlinear and a chaotic system each.

24.3NEJun 3, 2020Code

Optimizing Neural Networks via Koopman Operator Theory

Akshunna S. Dogra, William T Redman

Koopman operator theory, a powerful framework for discovering the underlying dynamics of nonlinear dynamical systems, was recently shown to be intimately connected with neural network training. In this work, we take the first steps in making use of this connection. As Koopman operator theory is a linear theory, a successful implementation of it in evolving network weights and biases offers the promise of accelerated training, especially in the context of deep networks, where optimization is inherently a non-convex problem. We show that Koopman operator theoretic methods allow for accurate predictions of weights and biases of feedforward, fully connected deep networks over a non-trivial range of training time. During this window, we find that our approach is >10x faster than various gradient descent based methods (e.g. Adam, Adadelta, Adagrad), in line with our complexity analysis. We end by highlighting open questions in this exciting intersection between dynamical systems and neural network theory. We highlight additional methods by which our results could be expanded to broader classes of networks and larger training intervals, which shall be the focus of future work.