LGJan 16, 2023Code
Causal Recurrent Variational Autoencoder for Medical Time Series GenerationHongming Li, Shujian Yu, Jose Principe
We propose causal recurrent variational autoencoder (CR-VAE), a novel generative model that is able to learn a Granger causal graph from a multivariate time series x and incorporates the underlying causal mechanism into its data generation process. Distinct to the classical recurrent VAEs, our CR-VAE uses a multi-head decoder, in which the $p$-th head is responsible for generating the $p$-th dimension of $\mathbf{x}$ (i.e., $\mathbf{x}^p$). By imposing a sparsity-inducing penalty on the weights (of the decoder) and encouraging specific sets of weights to be zero, our CR-VAE learns a sparse adjacency matrix that encodes causal relations between all pairs of variables. Thanks to this causal matrix, our decoder strictly obeys the underlying principles of Granger causality, thereby making the data generating process transparent. We develop a two-stage approach to train the overall objective. Empirically, we evaluate the behavior of our model in synthetic data and two real-world human brain datasets involving, respectively, the electroencephalography (EEG) signals and the functional magnetic resonance imaging (fMRI) data. Our model consistently outperforms state-of-the-art time series generative models both qualitatively and quantitatively. Moreover, it also discovers a faithful causal graph with similar or improved accuracy over existing Granger causality-based causal inference methods. Code of CR-VAE is publicly available at https://github.com/hongmingli1995/CR-VAE.
LGJul 28, 2023
Universal Recurrent Event Memories for Streaming DataRan Dou, Jose Principe
In this paper, we propose a new event memory architecture (MemNet) for recurrent neural networks, which is universal for different types of time series data such as scalar, multivariate or symbolic. Unlike other external neural memory architectures, it stores key-value pairs, which separate the information for addressing and for content to improve the representation, as in the digital archetype. Moreover, the key-value pairs also avoid the compromise between memory depth and resolution that applies to memories constructed by the model state. One of the MemNet key characteristics is that it requires only linear adaptive mapping functions while implementing a nonlinear operation on the input data. MemNet architecture can be applied without modifications to scalar time series, logic operators on strings, and also to natural language processing, providing state-of-the-art results in all application domains such as the chaotic time series, the symbolic operation tasks, and the question-answering tasks (bAbI). Finally, controlled by five linear layers, MemNet requires a much smaller number of training parameters than other external memory networks as well as the transformer network. The space complexity of MemNet equals a single self-attention layer. It greatly improves the efficiency of the attention mechanism and opens the door for IoT applications.
LGJul 28, 2023
Dynamic Analysis and an Eigen Initializer for Recurrent Neural NetworksRan Dou, Jose Principe
In recurrent neural networks, learning long-term dependency is the main difficulty due to the vanishing and exploding gradient problem. Many researchers are dedicated to solving this issue and they proposed many algorithms. Although these algorithms have achieved great success, understanding how the information decays remains an open problem. In this paper, we study the dynamics of the hidden state in recurrent neural networks. We propose a new perspective to analyze the hidden state space based on an eigen decomposition of the weight matrix. We start the analysis by linear state space model and explain the function of preserving information in activation functions. We provide an explanation for long-term dependency based on the eigen analysis. We also point out the different behavior of eigenvalues for regression tasks and classification tasks. From the observations on well-trained recurrent neural networks, we proposed a new initialization method for recurrent neural networks, which improves consistently performance. It can be applied to vanilla-RNN, LSTM, and GRU. We test on many datasets, such as Tomita Grammars, pixel-by-pixel MNIST datasets, and machine translation datasets (Multi30k). It outperforms the Xavier initializer and kaiming initializer as well as other RNN-only initializers like IRNN and sp-RNN in several tasks.
CVSep 8, 2024
Fast Deep Predictive Coding Networks for Videos Feature Extraction without LabelsWenqian Xue, Chi Ding, Jose Principe
Brain-inspired deep predictive coding networks (DPCNs) effectively model and capture video features through a bi-directional information flow, even without labels. They are based on an overcomplete description of video scenes, and one of the bottlenecks has been the lack of effective sparsification techniques to find discriminative and robust dictionaries. FISTA has been the best alternative. This paper proposes a DPCN with a fast inference of internal model variables (states and causes) that achieves high sparsity and accuracy of feature clustering. The proposed unsupervised learning procedure, inspired by adaptive dynamic programming with a majorization-minimization framework, and its convergence are rigorously analyzed. Experiments in the data sets CIFAR-10, Super Mario Bros video game, and Coil-100 validate the approach, which outperforms previous versions of DPCNs on learning rate, sparsity ratio, and feature clustering accuracy. Because of DCPN's solid foundation and explainability, this advance opens the door for general applications in object recognition in video without labels.
14.4LGApr 8
Time-Series Classification with Multivariate Statistical Dependence FeaturesYao Sun, Bo Hu, Jose Principe
In this paper, we propose a novel framework for non-stationary time-series analysis that replaces conventional correlation-based statistics with direct estimation of statistical dependence in the normalized joint density of input and target signals, the cross density ratio (CDR). Unlike windowed correlation estimates, this measure is independent of sample order and robust to regime changes. The method builds on the functional maximal correlation algorithm (FMCA), which constructs a projection space by decomposing the eigenspectrum of the CDR. Multiscale features from this eigenspace are classified using a lightweight single-hidden-layer perceptron. On the TI-46 digit speech corpus, our approach outperforms hidden Markov models (HMMs) and state-of-the-art spiking neural networks, achieving higher accuracy with fewer than 10 layers and a storage footprint under 5 MB.
SDFeb 27, 2022
Hierarchical Linear Dynamical System for Representing Notes from Recorded AudioLeila Kalantari, Jose Principe, Kathryn E. Sieving
We seek to develop simultaneous segmentation and classification of notes from audio recordings in presence of outliers. The selected architecture for modeling time series is hierarchical linear dynamical system (HLDS). We propose a novel method for its parameter setting. HLDS can potentially be employed in two ways: 1) simultaneous segmentation and clustering for exploring data, i.e. finding unknown notes, 2) simultaneous segmentation and classification of audio recording for finding the notes of interest in the presence of outliers. We adapted HLDS for the second purpose since it is an easier task and still a challenging problem, e.g. in the field of bioacoustics. Each test clip has the same notes (but different instances) as of the training clip and also contain outlier notes. At test, it is automatically decided to which class of interest a note belongs to if any. Two applications of this work are to the fields of bioacoustics for detection of animal sounds in audio field recordings and also to musicology. Experiments have been conducted for segmentation and classification of both avian and musical notes from recorded audio.
LGAug 29, 2021
Uncertainty quantification for multiclass data descriptionLeila Kalantari, Jose Principe, Kathryn E. Sieving
In this manuscript, we propose a multiclass data description model based on kernel Mahalanobis distance (MDD-KM) with self-adapting hyperparameter setting. MDD-KM provides uncertainty quantification and can be deployed to build classification systems for the realistic scenario where out-of-distribution (OOD) samples are present among the test data. Given a test signal, a quantity related to empirical kernel Mahalanobis distance between the signal and each of the training classes is computed. Since these quantities correspond to the same reproducing kernel Hilbert space, they are commensurable and hence can be readily treated as classification scores without further application of fusion techniques. To set kernel parameters, we exploit the fact that predictive variance according to a Gaussian process (GP) is empirical kernel Mahalanobis distance when a centralized kernel is used, and propose to use GP's negative likelihood function as the cost function. We conduct experiments on the real problem of avian note classification. We report a prototypical classification system based on a hierarchical linear dynamical system with MDD-KM as a component. Our classification system does not require sound event detection as a preprocessing step, and is able to find instances of training avian notes with varying length among OOD samples (corresponding to unknown notes of disinterest) in the test audio clip. Domain knowledge is leveraged to make crisp decisions from raw classification scores. We demonstrate the superior performance of MDD-KM over possibilistic K-nearest neighbor.
MLMay 12, 2020
Modularizing Deep Learning via Pairwise Learning With KernelsShiyu Duan, Shujian Yu, Jose Principe
By redefining the conventional notions of layers, we present an alternative view on finitely wide, fully trainable deep neural networks as stacked linear models in feature spaces, leading to a kernel machine interpretation. Based on this construction, we then propose a provably optimal modular learning framework for classification that does not require between-module backpropagation. This modular approach brings new insights into the label requirement of deep learning: It leverages only implicit pairwise labels (weak supervision) when learning the hidden modules. When training the output module, on the other hand, it requires full supervision but achieves high label efficiency, needing as few as 10 randomly selected labeled examples (one from each class) to achieve 94.88% accuracy on CIFAR-10 using a ResNet-18 backbone. Moreover, modular training enables fully modularized deep learning workflows, which then simplify the design and implementation of pipelines and improve the maintainability and reusability of models. To showcase the advantages of such a modularized workflow, we describe a simple yet reliable method for estimating reusability of pre-trained modules as well as task transferability in a transfer learning setting. At practically no computation overhead, it precisely described the task space structure of 15 binary classification tasks from CIFAR-10.
MLSep 25, 2019
Information Plane Analysis of Deep Neural Networks via Matrix-Based Renyi's Entropy and Tensor KernelsKristoffer Wickstrøm, Sigurd Løkse, Michael Kampffmeyer et al.
Analyzing deep neural networks (DNNs) via information plane (IP) theory has gained tremendous attention recently as a tool to gain insight into, among others, their generalization ability. However, it is by no means obvious how to estimate mutual information (MI) between each hidden layer and the input/desired output, to construct the IP. For instance, hidden layers with many neurons require MI estimators with robustness towards the high dimensionality associated with such layers. MI estimators should also be able to naturally handle convolutional layers, while at the same time being computationally tractable to scale to large networks. None of the existing IP methods to date have been able to study truly deep Convolutional Neural Networks (CNNs), such as the e.g.\ VGG-16. In this paper, we propose an IP analysis using the new matrix--based Rényi's entropy coupled with tensor kernels over convolutional layers, leveraging the power of kernel methods to represent properties of the probability distribution independently of the dimensionality of the data. The obtained results shed new light on the previous literature concerning small-scale DNNs, however using a completely new approach. Importantly, the new framework enables us to provide the first comprehensive IP analysis of contemporary large-scale DNNs and CNNs, investigating the different training phases and providing new insights into the training dynamics of large-scale neural networks.
LGMay 1, 2018
A Taxonomy for Neural Memory NetworksYing Ma, Jose Principe
In this paper, a taxonomy for memory networks is proposed based on their memory organization. The taxonomy includes all the popular memory networks: vanilla recurrent neural network (RNN), long short term memory (LSTM ), neural stack and neural Turing machine and their variants. The taxonomy puts all these networks under a single umbrella and shows their relative expressive power , i.e. vanilla RNN <=LSTM<=neural stack<=neural RAM. The differences and commonality between these networks are analyzed. These differences are also connected to the requirements of different tasks which can give the user instructions of how to choose or design an appropriate memory network for a specific task. As a conceptual simplified class of problems, four tasks of synthetic symbol sequences: counting, counting with interference, reversing and repeat counting are developed and tested to verify our arguments. And we use two natural language processing problems to discuss how this taxonomy helps choosing the appropriate neural memory networks for real world problem.
LGFeb 11, 2018
On Kernel Method-Based Connectionist Models and Supervised Deep Learning Without BackpropagationShiyu Duan, Shujian Yu, Yunmei Chen et al.
We propose a novel family of connectionist models based on kernel machines and consider the problem of learning layer-by-layer a compositional hypothesis class, i.e., a feedforward, multilayer architecture, in a supervised setting. In terms of the models, we present a principled method to "kernelize" (partly or completely) any neural network (NN). With this method, we obtain a counterpart of any given NN that is powered by kernel machines instead of neurons. In terms of learning, when learning a feedforward deep architecture in a supervised setting, one needs to train all the components simultaneously using backpropagation (BP) since there are no explicit targets for the hidden layers (Rumelhart86). We consider without loss of generality the two-layer case and present a general framework that explicitly characterizes a target for the hidden layer that is optimal for minimizing the objective function of the network. This characterization then makes possible a purely greedy training scheme that learns one layer at a time, starting from the input layer. We provide realizations of the abstract framework under certain architectures and objective functions. Based on these realizations, we present a layer-wise training algorithm for an l-layer feedforward network for classification, where l>=2 can be arbitrary. This algorithm can be given an intuitive geometric interpretation that makes the learning dynamics transparent. Empirical results are provided to complement our theory. We show that the kernelized networks, trained layer-wise, compare favorably with classical kernel machines as well as other connectionist models trained by BP. We also visualize the inner workings of the greedy kernelized models to validate our claim on the transparency of the layer-wise algorithm.
CVMay 3, 2017
Marine Animal Classification with Correntropy Loss Based Multi-view LearningZheng Cao, Shujian Yu, Bing Ouyang et al.
To analyze marine animals behavior, seasonal distribution and abundance, digital imagery can be acquired by visual or Lidar camera. Depending on the quantity and properties of acquired imagery, the animals are characterized as either features (shape, color, texture, etc.), or dissimilarity matrices derived from different shape analysis methods (shape context, internal distance shape context, etc.). For both cases, multi-view learning is critical in integrating more than one set of feature or dissimilarity matrix for higher classification accuracy. This paper adopts correntropy loss as cost function in multi-view learning, which has favorable statistical properties for rejecting noise. For the case of features, the correntropy loss-based multi-view learning and its entrywise variation are developed based on the multi-view intact space learning algorithm. For the case of dissimilarity matrices, the robust Euclidean embedding algorithm is extended to its multi-view form with the correntropy loss function. Results from simulated data and real-world marine animal imagery show that the proposed algorithms can effectively enhance classification rate, as well as suppress noise under different noise conditions.