N. Benjamin Erichson

h-index17

34papers

1,445citations

Novelty54%

AI Score47

Ranked #32,990 of 194,257 authors (top 17%)#7,784 in LG (top 19%)

34 Papers

1.2NAMar 13, 2020

Randomized CP Tensor Decomposition

N. Benjamin Erichson, Krithika Manohar, Steven L. Brunton et al.

The CANDECOMP/PARAFAC (CP) tensor decomposition is a popular dimensionality-reduction method for multiway data. Dimensionality reduction is often sought after since many high-dimensional tensors have low intrinsic rank relative to the dimension of the ambient measurement space. However, the emergence of `big data' poses significant computational challenges for computing this fundamental tensor decomposition. By leveraging modern randomized algorithms, we demonstrate that coherent structures can be learned from a smaller representation of the tensor in a fraction of the time. Thus, this simple but powerful algorithm enables one to compute the approximate CP decomposition even for massive tensors. The approximation error can thereby be controlled via oversampling and the computation of power iterations. In addition to theoretical results, several empirical results demonstrate the performance of the proposed algorithm.

15.3CVJun 24, 2023Code

SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning

Pu Ren, N. Benjamin Erichson, Junyi Guo et al.

Super-resolution (SR) techniques aim to enhance data resolution, enabling the retrieval of finer details, and improving the overall quality and fidelity of the data representation. There is growing interest in applying SR methods to complex spatiotemporal systems within the Scientific Machine Learning (SciML) community, with the hope of accelerating numerical simulations and/or improving forecasts in weather, climate, and related areas. However, the lack of standardized benchmark datasets for comparing and validating SR methods hinders progress and adoption in SciML. To address this, we introduce SuperBench, the first benchmark dataset featuring high-resolution datasets, including data from fluid flows, cosmology, and weather. Here, we focus on validating spatial SR performance from data-centric and physics-preserved perspectives, as well as assessing robustness to data degradation tasks. While deep learning-based SR methods (developed in the computer vision community) excel on certain tasks, despite relatively limited prior physics information, we identify limitations of these methods in accurately capturing intricate fine-scale features and preserving fundamental physical properties and constraints in scientific data. These shortcomings highlight the importance and subtlety of incorporating domain knowledge into ML models. We anticipate that SuperBench will help to advance SR methods for science.

22.6LGOct 4, 2023Code

Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs

Ilan Naiman, N. Benjamin Erichson, Pu Ren et al.

Generating realistic time series data is important for many engineering and scientific applications. Existing work tackles this problem using generative adversarial networks (GANs). However, GANs are unstable during training, and they can suffer from mode collapse. While variational autoencoders (VAEs) are known to be more robust to the these issues, they are (surprisingly) less considered for time series generation. In this work, we introduce Koopman VAE (KoVAE), a new generative framework that is based on a novel design for the model prior, and that can be optimized for either regular and irregular training data. Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map. Our approach enhances generative modeling with two desired features: (i) incorporating domain knowledge can be achieved by leveraging spectral tools that prescribe constraints on the eigenvalues of the linear map; and (ii) studying the qualitative behavior and stability of the system can be performed using tools from dynamical systems theory. Our results show that KoVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks. Whether trained on regular or irregular data, KoVAE generates time series that improve both discriminative and predictive metrics. We also present visual evidence suggesting that KoVAE learns probability density functions that better approximate the empirical ground truth distribution.

5.8LGDec 1, 2022

Gated Recurrent Neural Networks with Weighted Time-Delay Feedback

N. Benjamin Erichson, Soon Hoe Lim, Michael W. Mahoney

In this paper, we present a novel approach to modeling long-term dependencies in sequential data by introducing a gated recurrent unit (GRU) with a weighted time-delay feedback mechanism. Our proposed model, named $τ$-GRU, is a discretized version of a continuous-time formulation of a recurrent unit, where the dynamics are governed by delay differential equations (DDEs). We prove the existence and uniqueness of solutions for the continuous-time model and show that the proposed feedback mechanism can significantly improve the modeling of long-term dependencies. Our empirical results indicate that $τ$-GRU outperforms state-of-the-art recurrent units and gated recurrent architectures on a range of tasks, achieving faster convergence and better generalization.

16.5LGOct 2, 2023

Robustifying State-space Models for Long Sequences via Approximate Diagonalization

Annan Yu, Arnur Nigmetov, Dmitriy Morozov et al.

State-space models (SSMs) have recently emerged as a framework for learning long-range sequence tasks. An example is the structured state-space sequence (S4) layer, which uses the diagonal-plus-low-rank structure of the HiPPO initialization framework. However, the complicated structure of the S4 layer poses challenges; and, in an effort to address these challenges, models such as S4D and S5 have considered a purely diagonal structure. This choice simplifies the implementation, improves computational efficiency, and allows channel communication. However, diagonalizing the HiPPO framework is itself an ill-posed problem. In this paper, we propose a general solution for this and related ill-posed diagonalization problems in machine learning. We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology, which is based on the pseudospectral theory of non-normal operators, and which may be interpreted as the approximate diagonalization of the non-normal matrices defining SSMs. Based on this, we introduce the S4-PTD and S5-PTD models. Through theoretical analysis of the transfer functions of different initialization schemes, we demonstrate that the S4-PTD/S5-PTD initialization strongly converges to the HiPPO framework, while the S4D/S5 initialization only achieves weak convergences. As a result, our new models show resilience to Fourier-mode noise-perturbed inputs, a crucial property not achieved by the S4D/S5 models. In addition to improved robustness, our S5-PTD model averages 87.6% accuracy on the Long-Range Arena benchmark, demonstrating that the PTD methodology helps to improve the accuracy of deep learning models.

5.9GEO-PHJul 21, 2024Code

Learning Physics for Unveiling Hidden Earthquake Ground Motions via Conditional Generative Modeling

Pu Ren, Rie Nakata, Maxime Lacour et al.

Predicting high-fidelity ground motions for future earthquakes is crucial for seismic hazard assessment and infrastructure resilience. Conventional empirical simulations suffer from sparse sensor distribution and geographically localized earthquake locations, while physics-based methods are computationally intensive and require accurate representations of Earth structures and earthquake sources. We propose a novel artificial intelligence (AI) simulator, Conditional Generative Modeling for Ground Motion (CGM-GM), to synthesize high-frequency and spatially continuous earthquake ground motion waveforms. CGM-GM leverages earthquake magnitudes and geographic coordinates of earthquakes and sensors as inputs, learning complex wave physics and Earth heterogeneities, without explicit physics constraints. This is achieved through a probabilistic autoencoder that captures latent distributions in the time-frequency domain and variational sequential models for prior and posterior distributions. We evaluate the performance of CGM-GM using small-magnitude earthquake records from the San Francisco Bay Area, a region with high seismic risks. CGM-GM demonstrates a strong potential for outperforming a state-of-the-art non-ergodic empirical ground motion model and shows great promise in seismology and beyond.

7.4MLFeb 22, 2023Code

Error Estimation for Random Fourier Features

Junwen Yao, N. Benjamin Erichson, Miles E. Lopes

Random Fourier Features (RFF) is among the most popular and broadly applicable approaches for scaling up kernel methods. In essence, RFF allows the user to avoid costly computations on a large kernel matrix via a fast randomized approximation. However, a pervasive difficulty in applying RFF is that the user does not know the actual error of the approximation, or how this error will propagate into downstream learning tasks. Up to now, the RFF literature has primarily dealt with these uncertainties using theoretical error bounds, but from a user's standpoint, such results are typically impractical -- either because they are highly conservative or involve unknown quantities. To tackle these general issues in a data-driven way, this paper develops a bootstrap approach to numerically estimate the errors of RFF approximations. Three key advantages of this approach are: (1) The error estimates are specific to the problem at hand, avoiding the pessimism of worst-case bounds. (2) The approach is flexible with respect to different uses of RFF, and can even estimate errors in downstream learning tasks. (3) The approach enables adaptive computation, so that the user can quickly inspect the error of a rough initial kernel approximation and then predict how much extra work is needed. Lastly, in exchange for all of these benefits, the error estimates can be obtained at a modest computational cost.

14.2LGMay 22, 2024Code

HOPE for a Robust Parameterization of Long-memory State Space Models

Annan Yu, Michael W. Mahoney, N. Benjamin Erichson

State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences. To achieve state-of-the-art performance, an SSM often needs a specifically designed initialization, and the training of state matrices is on a logarithmic scale with a very small learning rate. To understand these choices from a unified perspective, we view SSMs through the lens of Hankel operator theory. Building upon it, we develop a new parameterization scheme, called HOPE, for LTI systems that utilizes Markov parameters within Hankel operators. Our approach helps improve the initialization and training stability, leading to a more robust parameterization. We efficiently implement these innovations by nonuniformly sampling the transfer functions of LTI systems, and they require fewer parameters compared to canonical SSMs. When benchmarked against HiPPO-initialized models such as S4 and S4D, an SSM parameterized by Hankel operators demonstrates improved performance on Long-Range Arena (LRA) tasks. Moreover, our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.

16.9LGMay 13, 2025

Block-Biased Mamba for Long-Range Sequence Processing

Annan Yu, N. Benjamin Erichson

Mamba extends earlier state space models (SSMs) by introducing input-dependent dynamics, and has demonstrated strong empirical performance across a range of domains, including language modeling, computer vision, and foundation models. However, a surprising weakness remains: despite being built on architectures designed for long-range dependencies, Mamba performs poorly on long-range sequential tasks. Understanding and addressing this gap is important for improving Mamba's universality and versatility. In this work, we analyze Mamba's limitations through three perspectives: expressiveness, inductive bias, and training stability. Our theoretical results show how Mamba falls short in each of these aspects compared to earlier SSMs such as S4D. To address these issues, we propose $\text{B}_2\text{S}_6$, a simple extension of Mamba's S6 unit that combines block-wise selective dynamics with a channel-specific bias. We prove that these changes equip the model with a better-suited inductive bias and improve its expressiveness and stability. Empirically, $\text{B}_2\text{S}_6$ outperforms S4 and S4D on Long-Range Arena (LRA) tasks while maintaining Mamba's performance on language modeling benchmarks.

11.4LGJan 24, 2025Code

A Deep State Space Model for Rainfall-Runoff Simulations

Yihan Wang, Lujun Zhang, Annan Yu et al.

The classical way of studying the rainfall-runoff processes in the water cycle relies on conceptual or physically-based hydrologic models. Deep learning (DL) has recently emerged as an alternative and blossomed in hydrology community for rainfall-runoff simulations. However, the decades-old Long Short-Term Memory (LSTM) network remains the benchmark for this task, outperforming newer architectures like Transformers. In this work, we propose a State Space Model (SSM), specifically the Frequency Tuned Diagonal State Space Sequence (S4D-FT) model, for rainfall-runoff simulations. The proposed S4D-FT is benchmarked against the established LSTM and a physically-based Sacramento Soil Moisture Accounting model across 531 watersheds in the contiguous United States (CONUS). Results show that S4D-FT is able to outperform the LSTM model across diverse regions. Our pioneering introduction of the S4D-FT for rainfall-runoff simulations challenges the dominance of LSTM in the hydrology community and expands the arsenal of DL tools available for hydrological modeling.

7.1LGDec 13, 2025

HydroDiffusion: Diffusion-Based Probabilistic Streamflow Forecasting with a State Space Backbone

Yihan Wang, Annan Yu, Lujun Zhang et al.

Recent advances have introduced diffusion models for probabilistic streamflow forecasting, demonstrating strong early flood-warning skill. However, current implementations rely on recurrent Long Short-Term Memory (LSTM) backbones and single-step training objectives, which limit their ability to capture long-range dependencies and produce coherent forecast trajectories across lead times. To address these limitations, we developed HydroDiffusion, a diffusion-based probabilistic forecasting framework with a decoder-only state space model backbone. The proposed framework jointly denoises full multi-day trajectories in a single pass, ensuring temporal coherence and mitigating error accumulation common in autoregressive prediction. HydroDiffusion is evaluated across 531 watersheds in the contiguous United States (CONUS) in the CAMELS dataset. We benchmark HydroDiffusion against two diffusion baselines with LSTM backbones, as well as the recently proposed Diffusion-based Runoff Model (DRUM). Results show that HydroDiffusion achieves strong nowcast accuracy when driven by observed meteorological forcings, and maintains consistent performance across the full simulation horizon. Moreover, HydroDiffusion delivers stronger deterministic and probabilistic forecast skill than DRUM in operational forecasting. These results establish HydroDiffusion as a robust generative modeling framework for medium-range streamflow forecasting, providing both a new modeling benchmark and a foundation for future research on probabilistic hydrologic prediction at continental scales.

7.1LGFeb 10, 2025Code

Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting

Kareem Hegazy, Michael W. Mahoney, N. Benjamin Erichson

Transformers have recently shown strong performance in time-series forecasting, but their all-to-all attention mechanism overlooks the (temporal) causal and often (temporally) local nature of data. We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay. This simple yet effective modification endows the model with an inductive bias favoring temporally local dependencies, while still allowing sufficient flexibility to learn the unique correlation structure of each dataset. Our empirical results demonstrate that Powerformer not only achieves state-of-the-art accuracy on public time-series benchmarks, but also that it offers improved interpretability of attention patterns. Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention. These findings highlight the importance of domain-specific modifications to the Transformer architecture for time-series forecasting, and they establish Powerformer as a strong, efficient, and principled baseline for future research and real-world applications.

13.6LGFeb 17, 2022Code

Learning continuous models for continuous physics

Aditi S. Krishnapriyan, Alejandro F. Queiruga, N. Benjamin Erichson et al.

Dynamical systems that evolve continuously over time are ubiquitous throughout science and engineering. Machine learning (ML) provides data-driven approaches to model and predict the dynamics of such systems. A core issue with this approach is that ML models are typically trained on discrete data, using ML methodologies that are not aware of underlying continuity properties. This results in models that often do not capture any underlying continuous dynamics -- either of the system of interest, or indeed of any related system. To address this challenge, we develop a convergence test based on numerical analysis theory. Our test verifies whether a model has learned a function that accurately approximates an underlying continuous dynamics. Models that fail this test fail to capture relevant dynamics, rendering them of limited utility for many scientific prediction tasks; while models that pass this test enable both better interpolation and better extrapolation in multiple ways. Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.

13.6LGFeb 2, 2022

NoisyMix: Boosting Model Robustness to Common Corruptions

N. Benjamin Erichson, Soon Hoe Lim, Winnie Xu et al.

For many real-world applications, obtaining stable and robust statistical performance is more important than simply achieving state-of-the-art predictive test accuracy, and thus robustness of neural networks is an increasingly important topic. Relatedly, data augmentation schemes have been shown to improve robustness with respect to input perturbations and domain shifts. Motivated by this, we introduce NoisyMix, a novel training scheme that promotes stability as well as leverages noisy augmentations in input and feature space to improve both model robustness and in-domain accuracy. NoisyMix produces models that are consistently more robust and that provide well-calibrated estimates of class membership probabilities. We demonstrate the benefits of NoisyMix on a range of benchmark datasets, including ImageNet-C, ImageNet-R, and ImageNet-P. Moreover, we provide theory to understand implicit regularization and robustness of NoisyMix.

3.1LGOct 26, 2021

Cluster-and-Conquer: A Framework For Time-Series Forecasting

Reese Pathak, Rajat Sen, Nikhil Rao et al.

We propose a three-stage framework for forecasting high-dimensional time-series data. Our method first estimates parameters for each univariate time series. Next, we use these parameters to cluster the time series. These clusters can be viewed as multivariate time series, for which we then compute parameters. The forecasted values of a single time series can depend on the history of other time series in the same cluster, accounting for intra-cluster similarity while minimizing potential noise in predictions by ignoring inter-cluster effects. Our framework -- which we refer to as "cluster-and-conquer" -- is highly general, allowing for any time-series forecasting and clustering method to be used in each step. It is computationally efficient and embarrassingly parallel. We motivate our framework with a theoretical analysis in an idealized mixed linear regression setting, where we provide guarantees on the quality of the estimates. We accompany these guarantees with experimental results that demonstrate the advantages of our framework: when instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets, sometimes outperforming deep-learning-based approaches.

21.3LGOct 10, 2021Code

Long Expressive Memory for Sequence Modeling

T. Konstantin Rusch, Siddhartha Mishra, N. Benjamin Erichson et al.

We propose a novel method called Long Expressive Memory (LEM) for learning long-term sequential dependencies. LEM is gradient-based, it can efficiently process sequential tasks with very long-term dependencies, and it is sufficiently expressive to be able to learn complicated input-output maps. To derive LEM, we consider a system of multiscale ordinary differential equations, as well as a suitable time-discretization of this system. For LEM, we derive rigorous bounds to show the mitigation of the exploding and vanishing gradients problem, a well-known challenge for gradient-based recurrent sequential learning methods. We also prove that LEM can approximate a large class of dynamical systems to high accuracy. Our empirical results, ranging from image and time-series classification through dynamical systems prediction to speech recognition and language modeling, demonstrate that LEM outperforms state-of-the-art recurrent neural networks, gated recurrent units, and long short-term memory models.

16.0LGOct 5, 2021Code

Noisy Feature Mixup

Soon Hoe Lim, N. Benjamin Erichson, Francisco Utrera et al.

We introduce Noisy Feature Mixup (NFM), an inexpensive yet effective method for data augmentation that combines the best of interpolation based training and noise injection schemes. Rather than training with convex combinations of pairs of examples and their labels, we use noise-perturbed convex combinations of pairs of data points in both input and feature space. This method includes mixup and manifold mixup as special cases, but it has additional advantages, including better smoothing of decision boundaries and enabling improved model robustness. We provide theory to understand this as well as the implicit regularization effects of NFM. Our theory is supported by empirical results, demonstrating the advantage of NFM, as compared to mixup and manifold mixup. We show that residual networks and vision transformers trained with NFM have favorable trade-offs between predictive accuracy on clean data and robustness with respect to various types of data perturbation across a range of computer vision benchmark datasets.

11.9LGJun 21, 2021Code

Stateful ODE-Nets using Basis Function Expansions

Alejandro Queiruga, N. Benjamin Erichson, Liam Hodgkinson et al.

The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems. In this work, we reconsider formulations of the weights as continuous-in-depth functions using linear combinations of basis functions which enables us to leverage parameter transformations such as function projections. In turn, this view allows us to formulate a novel stateful ODE-Block that handles stateful layers. The benefits of this new ODE-Block are twofold: first, it enables incorporating meaningful continuous-in-depth batch normalization layers to achieve state-of-the-art performance; second, it enables compressing the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance and reducing both inference time and memory footprint. Performance is demonstrated by applying our stateful ODE-Block to (a) image classification tasks using convolutional units and (b) sentence-tagging tasks using transformer encoder units.

4.4LGFeb 18, 2021

A Differential Geometry Perspective on Orthogonal Recurrent Models

Omri Azencot, N. Benjamin Erichson, Mirela Ben-Chen et al.

Recently, orthogonal recurrent neural networks (RNNs) have emerged as state-of-the-art models for learning long-term dependencies. This class of models mitigates the exploding and vanishing gradients problem by design. In this work, we employ tools and insights from differential geometry to offer a novel perspective on orthogonal RNNs. We show that orthogonal RNNs may be viewed as optimizing in the space of divergence-free vector fields. Specifically, based on a well-known result in differential geometry that relates vector fields and linear operators, we prove that every divergence-free vector field is related to a skew-symmetric matrix. Motivated by this observation, we study a new recurrent model, which spans the entire space of vector fields. Our method parameterizes vector fields via the directional derivatives of scalar functions. This requires the construction of latent inner product, gradient, and divergence operators. In comparison to state-of-the-art orthogonal RNNs, our approach achieves comparable or better results on a variety of benchmark tasks.

23.4MLFeb 9, 2021Code

Noisy Recurrent Neural Networks

Soon Hoe Lim, N. Benjamin Erichson, Liam Hodgkinson et al.

We provide a general framework for studying recurrent neural networks (RNNs) trained by injecting noise into hidden states. Specifically, we consider RNNs that can be viewed as discretizations of stochastic differential equations driven by input data. This framework allows us to study the implicit regularization effect of general noise injection schemes by deriving an approximate explicit regularizer in the small noise regime. We find that, under reasonable assumptions, this implicit regularization promotes flatter minima; it biases towards models with more stable dynamics; and, in classification tasks, it favors models with larger classification margin. Sufficient conditions for global stability are obtained, highlighting the phenomenon of stochastic stabilization, where noise injection can improve stability during training. Our theory is supported by empirical results which demonstrate that the RNNs have improved robustness with respect to various input perturbations.

19.7LGAug 5, 2020Code

Continuous-in-Depth Neural Networks

Alejandro F. Queiruga, N. Benjamin Erichson, Dane Taylor et al.

Recent work has attempted to interpret residual networks (ResNets) as one step of a forward Euler discretization of an ordinary differential equation, focusing mainly on syntactic algebraic similarities between the two systems. Discrete dynamical integrators of continuous dynamical systems, however, have a much richer structure. We first show that ResNets fail to be meaningful dynamical integrators in this richer sense. We then demonstrate that neural network models can learn to represent continuous dynamical systems, with this richer structure and properties, by embedding them into higher-order numerical integration schemes, such as the Runge Kutta schemes. Based on these insights, we introduce ContinuousNet as a continuous-in-depth generalization of ResNet architectures. ContinuousNets exhibit an invariance to the particular computational graph manifestation. That is, the continuous-in-depth model can be evaluated with different discrete time step sizes, which changes the number of layers, and different numerical integration schemes, which changes the graph connectivity. We show that this can be used to develop an incremental-in-depth training scheme that improves model quality, while significantly decreasing training time. We also show that, once trained, the number of units in the computational graph can even be decreased, for faster inference with little-to-no accuracy drop.

11.5LGJul 31, 2020

Noise-Response Analysis of Deep Neural Networks Quantifies Robustness and Fingerprints Structural Malware

N. Benjamin Erichson, Dane Taylor, Qixuan Wu et al.

The ubiquity of deep neural networks (DNNs), cloud-based training, and transfer learning is giving rise to a new cybersecurity frontier in which unsecure DNNs have `structural malware' (i.e., compromised weights and activation pathways). In particular, DNNs can be designed to have backdoors that allow an adversary to easily and reliably fool an image classifier by adding a pattern of pixels called a trigger. It is generally difficult to detect backdoors, and existing detection methods are computationally expensive and require extensive resources (e.g., access to the training data). Here, we propose a rapid feature-generation technique that quantifies the robustness of a DNN, `fingerprints' its nonlinearity, and allows us to detect backdoors (if present). Our approach involves studying how a DNN responds to noise-infused images with varying noise intensity, which we summarize with titration curves. We find that DNNs with backdoors are more sensitive to input noise and respond in a characteristic way that reveals the backdoor and where it leads (its `target'). Our empirical results demonstrate that we can accurately detect backdoors with high confidence orders-of-magnitude faster than existing approaches (seconds versus hours).

21.9LGJul 11, 2020Code

Adversarially-Trained Deep Nets Transfer Better: Illustration on Image Classification

Francisco Utrera, Evan Kravitz, N. Benjamin Erichson et al.

Transfer learning has emerged as a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains. This process consists of taking a neural network pre-trained on a large feature-rich source dataset, freezing the early layers that encode essential generic image properties, and then fine-tuning the last few layers in order to capture specific information related to the target situation. This approach is particularly useful when only limited or weakly labeled data are available for the new task. In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models, especially if only limited data are available for the new domain task. Further, we observe that adversarial training biases the learnt representations to retaining shapes, as opposed to textures, which impacts the transferability of the source models. Finally, through the lens of influence functions, we discover that transferred adversarially-trained models contain more human-identifiable semantic information, which explains -- at least partly -- why adversarially-trained models transfer better.

25.6LGJun 22, 2020Code

Lipschitz Recurrent Neural Networks

N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga et al.

Viewing recurrent neural networks (RNNs) as continuous-time dynamical systems, we propose a recurrent unit that describes the hidden state's evolution with two parts: a well-understood linear component plus a Lipschitz nonlinearity. This particular functional form facilitates stability analysis of the long-term behavior of the recurrent unit using tools from nonlinear systems theory. In turn, this enables architectural design decisions before experimentation. Sufficient conditions for global stability of the recurrent unit are obtained, motivating a novel scheme for constructing hidden-to-hidden matrices. Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks, including computer vision, language modeling and speech prediction tasks. Finally, through Hessian-based analysis we demonstrate that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs.

4.9MLMar 10, 2020

Error Estimation for Sketched SVD via the Bootstrap

Miles E. Lopes, N. Benjamin Erichson, Michael W. Mahoney

In order to compute fast approximations to the singular value decompositions (SVD) of very large matrices, randomized sketching algorithms have become a leading approach. However, a key practical difficulty of sketching an SVD is that the user does not know how far the sketched singular vectors/values are from the exact ones. Indeed, the user may be forced to rely on analytical worst-case error bounds, which do not account for the unique structure of a given problem. As a result, the lack of tools for error estimation often leads to much more computation than is really necessary. To overcome these challenges, this paper develops a fully data-driven bootstrap method that numerically estimates the actual error of sketched singular vectors/values. In particular, this allows the user to inspect the quality of a rough initial sketched SVD, and then adaptively predict how much extra work is needed to reach a given error tolerance. Furthermore, the method is computationally inexpensive, because it operates only on sketched objects, and it requires no passes over the full matrix being factored. Lastly, the method is supported by theoretical guarantees and a very encouraging set of experimental results.

27.6COMP-PHMar 4, 2020Code

Forecasting Sequential Data using Consistent Koopman Autoencoders

Omri Azencot, N. Benjamin Erichson, Vanessa Lin et al.

Recurrent neural networks are widely used on time series data, yet such models often ignore the underlying physical structures in such sequences. A new class of physics-based methods related to Koopman theory has been introduced, offering an alternative for processing nonlinear dynamical systems. In this work, we propose a novel Consistent Koopman Autoencoder model which, unlike the majority of existing work, leverages the forward and backward dynamics. Key to our approach is a new analysis which explores the interplay between consistent dynamics and their associated Koopman operators. Our network is directly related to the derived analysis, and its computational requirements are comparable to other baselines. We evaluate our method on a wide range of high-dimensional and short-term dependent problems, and it achieves accurate estimates for significant prediction horizons, while also being robust to noise.

26.6COMP-PHMay 26, 2019

Physics-informed Autoencoders for Lyapunov-stable Fluid Flow Prediction

N. Benjamin Erichson, Michael Muehlebach, Michael W. Mahoney

In addition to providing high-profile successes in computer vision and natural language processing, neural networks also provide an emerging set of techniques for scientific problems. Such data-driven models, however, typically ignore physical insights from the scientific system under consideration. Among other things, a physics-informed model formulation should encode some degree of stability or robustness or well-conditioning (in that a small change of the input will not lead to drastic changes in the output), characteristic of the underlying scientific problem. We investigate whether it is possible to include physics-informed prior knowledge for improving the model quality (e.g., generalization performance, sensitivity to parameter tuning, or robustness in the presence of noisy data). To that extent, we focus on the stability of an equilibrium, one of the most basic properties a dynamic system can have, via the lens of Lyapunov analysis. For the prototypical problem of fluid flow prediction, we show that models preserving Lyapunov stability improve the generalization error and reduce the prediction uncertainty.

20.9CRApr 7, 2019Code

JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks

N. Benjamin Erichson, Zhewei Yao, Michael W. Mahoney

It has been demonstrated that very simple attacks can fool highly-sophisticated neural network architectures. In particular, so-called adversarial examples, constructed from perturbations of input data that are small or imperceptible to humans but lead to different predictions, may lead to an enormous risk in certain critical applications. In light of this, there has been a great deal of work on developing adversarial training strategies to improve model robustness. These training strategies are very expensive, in both human and computational time. To complement these approaches, we propose a very simple and inexpensive strategy which can be used to ``retrofit'' a previously-trained network to improve its resilience to adversarial attacks. More concretely, we propose a new activation function---the JumpReLU---which, when used in place of a ReLU in an already-trained model, leads to a trade-off between predictive accuracy and robustness. This trade-off is controlled by the jump size, a hyper-parameter which can be tuned during the validation stage. Our empirical results demonstrate that this increases model robustness, protecting against adversarial attacks with substantially increased levels of perturbations. This is accomplished simply by retrofitting existing networks with our JumpReLU activation function, without the need for retraining the model. Additionally, we demonstrate that adversarially trained (robust) models can greatly benefit from retrofitting.

23.5COMP-PHFeb 20, 2019

Shallow Neural Networks for Fluid Flow Reconstruction with Limited Sensors

N. Benjamin Erichson, Lionel Mathelin, Zhewei Yao et al.

In many applications, it is important to reconstruct a fluid flow field, or some other high-dimensional state, from limited measurements and limited data. In this work, we propose a shallow neural network-based learning methodology for such fluid flow reconstruction. Our approach learns an end-to-end mapping between the sensor measurements and the high-dimensional fluid flow field, without any heavy preprocessing on the raw data. No prior knowledge is assumed to be available, and the estimation method is purely data-driven. We demonstrate the performance on three examples in fluid mechanics and oceanography, showing that this modern data-driven approach outperforms traditional modal approximation techniques which are commonly used for flow reconstruction. Not only does the proposed method show superior performance characteristics, it can also produce a comparable level of performance with traditional methods in the area, using significantly fewer sensors. Thus, the mathematical architecture is ideal for emerging global monitoring technologies where measurement data are often limited.

2.0IVNov 28, 2018

RetinaMatch: Efficient Template Matching of Retina Images for Teleophthalmology

Chen Gong, N. Benjamin Erichson, John P. Kelly et al.

Retinal template matching and registration is an important challenge in teleophthalmology with low-cost imaging devices. However, the images from such devices generally have a small field of view (FOV) and image quality degradations, making matching difficult. In this work, we develop an efficient and accurate retinal matching technique that combines dimension reduction and mutual information (MI), called RetinaMatch. The dimension reduction initializes the MI optimization as a coarse localization process, which narrows the optimization domain and avoids local optima. The effectiveness of RetinaMatch is demonstrated on the open fundus image database STARE with simulated reduced FOV and anticipated degradations, and on retinal images acquired by adapter-based optics attached to a smartphone. RetinaMatch achieves a success rate over 94\% on human retinal images with the matched target registration errors below 2 pixels on average, excluding the observer variability. It outperforms the standard template matching solutions. In the application of measuring vessel diameter repeatedly, single pixel errors are expected. In addition, our method can be used in the process of image mosaicking with area-based registration, providing a robust approach when the feature based methods fail. To the best of our knowledge, this is the first template matching algorithm for retina images with small template images from unconstrained retinal areas. In the context of the emerging mixed reality market, we envision automated retinal image matching and registration methods as transformative for advanced teleophthalmology and long-term retinal monitoring.

12.3MLApr 1, 2018

Sparse Principal Component Analysis via Variable Projection

N. Benjamin Erichson, Peng Zheng, Krithika Manohar et al.

Sparse principal component analysis (SPCA) has emerged as a powerful technique for modern data analysis, providing improved interpretation of low-rank structures by identifying localized spatial structures in the data and disambiguating between distinct time scales. We demonstrate a robust and scalable SPCA algorithm by formulating it as a value-function optimization problem. This viewpoint leads to a flexible and computationally efficient algorithm. Further, we can leverage randomized methods from linear algebra to extend the approach to the large-scale (big data) setting. Our proposed innovation also allows for a robust SPCA formulation which obtains meaningful sparse principal components in spite of grossly corrupted input data. The proposed algorithms are demonstrated using both synthetic and real world data, and show exceptional computational efficiency and diagnostic performance.

4.9MLFeb 23, 2018

Diffusion Maps meet Nyström

N. Benjamin Erichson, Lionel Mathelin, Steven L. Brunton et al.

Diffusion maps are an emerging data-driven technique for non-linear dimensionality reduction, which are especially useful for the analysis of coherent structures and nonlinear embeddings of dynamical systems. However, the computational complexity of the diffusion maps algorithm scales with the number of observations. Thus, long time-series data presents a significant challenge for fast and efficient embedding. We propose integrating the Nyström method with diffusion maps in order to ease the computational demand. We achieve a speedup of roughly two to four times when approximating the dominant diffusion map components.

6.0MLNov 6, 2017Code

Randomized Nonnegative Matrix Factorization

N. Benjamin Erichson, Ariana Mendible, Sophie Wihlborn et al.

Nonnegative matrix factorization (NMF) is a powerful tool for data mining. However, the emergence of `big data' has severely challenged our ability to compute this fundamental decomposition using deterministic algorithms. This paper presents a randomized hierarchical alternating least squares (HALS) algorithm to compute the NMF. By deriving a smaller matrix from the nonnegative input data, a more efficient nonnegative decomposition can be computed. Our algorithm scales to big data applications while attaining a near-optimal factorization. The proposed algorithm is evaluated using synthetic and real world data and shows substantial speedups compared to deterministic HALS.

11.8CVDec 11, 2015

Randomized Low-Rank Dynamic Mode Decomposition for Motion Detection

N. Benjamin Erichson, Carl Donovan

This paper introduces a fast algorithm for randomized computation of a low-rank Dynamic Mode Decomposition (DMD) of a matrix. Here we consider this matrix to represent the development of a spatial grid through time e.g. data from a static video source. DMD was originally introduced in the fluid mechanics community, but is also suitable for motion detection in video streams and its use for background subtraction has received little previous investigation. In this study we present a comprehensive evaluation of background subtraction, using the randomized DMD and compare the results with leading robust principal component analysis algorithms. The results are convincing and show the random DMD is an efficient and powerful approach for background modeling, allowing processing of high resolution videos in real-time. Supplementary materials include implementations of the algorithms in Python.