LGJun 1, 2022
Analysis of Catastrophic Forgetting for Random Orthogonal Transformation Tasks in the Overparameterized RegimeDaniel Goldfarb, Paul Hand
Overparameterization is known to permit strong generalization performance in neural networks. In this work, we provide an initial theoretical analysis of its effect on catastrophic forgetting in a continual learning setup. We show experimentally that in permuted MNIST image classification tasks, the generalization performance of multilayer perceptrons trained by vanilla stochastic gradient descent can be improved by overparameterization, and the extent of the performance increase achieved by overparameterization is comparable to that of state-of-the-art continual learning algorithms. We provide a theoretical explanation of this effect by studying a qualitatively similar two-task linear regression problem, where each task is related by a random orthogonal transformation. We show that when a model is trained on the two tasks in sequence without any additional regularization, the risk gain on the first task is small if the model is sufficiently overparameterized.
LGJan 23, 2024
The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting -- An Analytical ModelDaniel Goldfarb, Itay Evron, Nir Weinberger et al.
In continual learning, catastrophic forgetting is affected by multiple aspects of the tasks. Previous works have analyzed separately how forgetting is affected by either task similarity or overparameterization. In contrast, our paper examines how task similarity and overparameterization jointly affect forgetting in an analyzable model. Specifically, we focus on two-task continual linear regression, where the second task is a random orthogonal transformation of an arbitrary first task (an abstraction of random permutation tasks). We derive an exact analytical expression for the expected forgetting - and uncover a nuanced pattern. In highly overparameterized models, intermediate task similarity causes the most forgetting. However, near the interpolation threshold, forgetting decreases monotonically with the expected task similarity. We validate our findings with linear regression on synthetic data, and with neural networks on established permutation task benchmarks.
LGFeb 11, 2025
Analysis of Overparameterization in Continual Learning under a Linear ModelDaniel Goldfarb, Paul Hand
Autonomous machine learning systems that learn many tasks in sequence are prone to the catastrophic forgetting problem. Mathematical theory is needed in order to understand the extent of forgetting during continual learning. As a foundational step towards this goal, we study continual learning and catastrophic forgetting from a theoretical perspective in the simple setting of gradient descent with no explicit algorithmic mechanism to prevent forgetting. In this setting, we analytically demonstrate that overparameterization alone can mitigate forgetting in the context of a linear regression model. We consider a two-task setting motivated by permutation tasks, and show that as the overparameterization ratio becomes sufficiently high, a model trained on both tasks in sequence results in a low-risk estimator for the first task. As part of this work, we establish a non-asymptotic bound of the risk of a single linear regression task, which may be of independent interest to the field of double descent theory.
LGOct 31, 2019
Causal Inference via Conditional Kolmogorov Complexity using MDL BinningDaniel Goldfarb, Scott Evans
Recent developments have linked causal inference with Algorithmic Information Theory, and methods have been developed that utilize Conditional Kolmogorov Complexity to determine causation between two random variables. We present a method for inferring causal direction between continuous variables by using an MDL Binning technique for data discretization and complexity calculation. Our method captures the shape of the data and uses it to determine which variable has more information about the other. Its high predictive performance and robustness is shown on several real world use cases.
LGOct 31, 2018
Understanding Deep Neural Networks Using Topological Data AnalysisDaniel Goldfarb
Deep neural networks (DNN) are black box algorithms. They are trained using a gradient descent back propagation technique which trains weights in each layer for the sole goal of minimizing training error. Hence, the resulting weights cannot be directly explained. Using Topological Data Analysis (TDA) we can get an insight on how the neural network is thinking, specifically by analyzing the activation values of validation images as they pass through each layer.