CLMay 8, 2020
Sentiment Analysis Using Simplified Long Short-term Memory Recurrent Neural NetworksKarthik Gopalakrishnan, Fathi M. Salem
LSTM or Long Short Term Memory Networks is a specific type of Recurrent Neural Network (RNN) that is very effective in dealing with long sequence data and learning long term dependencies. In this work, we perform sentiment analysis on a GOP Debate Twitter dataset. To speed up training and reduce the computational cost and time, six different parameter reduced slim versions of the LSTM model (slim LSTM) are proposed. We evaluate two of these models on the dataset. The performance of these two LSTM models along with the standard LSTM model is compared. The effect of Bidirectional LSTM Layers is also studied. The work also consists of a study to choose the best architecture, apart from establishing the best set of hyper parameters for different LSTM Models.
NEJan 18, 2019
Slim LSTM networks: LSTM_6 and LSTM_C6Atra Akandeh, Fathi M. Salem
We have shown previously that our parameter-reduced variants of Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN) are comparable in performance to the standard LSTM RNN on the MNIST dataset. In this study, we show that this is also the case for two diverse benchmark datasets, namely, the review sentiment IMDB and the 20 Newsgroup datasets. Specifically, we focus on two of the simplest variants, namely LSTM_6 (i.e., standard LSTM with three constant fixed gates) and LSTM_C6 (i.e., LSTM_6 with further reduced cell body input block). We demonstrate that these two aggressively reduced-parameter variants are competitive with the standard LSTM when hyper-parameters, e.g., learning parameter, number of hidden units and gate constants are set properly. These architectures enable speeding up training computations and hence, these networks would be more suitable for online training and inference onto portable devices with relatively limited computational resources.
NEJan 2, 2019
Performance of Three Slim Variants of The Long Short-Term Memory (LSTM) LayerDaniel Kent, Fathi M. Salem
The Long Short-Term Memory (LSTM) layer is an important advancement in the field of neural networks and machine learning, allowing for effective training and impressive inference performance. LSTM-based neural networks have been successfully employed in various applications such as speech processing and language translation. The LSTM layer can be simplified by removing certain components, potentially speeding up training and runtime with limited change in performance. In particular, the recently introduced variants, called SLIM LSTMs, have shown success in initial experiments to support this view. Here, we perform computational analysis of the validation accuracy of a convolutional plus recurrent neural network architecture using comparatively the standard LSTM and three SLIM LSTM layers. We have found that some realizations of the SLIM LSTM layers can potentially perform as well as the standard LSTM layer for our considered architecture.
NEDec 29, 2018
SLIM LSTMsFathi M. Salem
Long Short-Term Memory (LSTM) Recurrent Neural networks (RNNs) rely on gating signals, each driven by a function of a weighted sum of at least 3 components: (i) one of an adaptive weight matrix multiplied by the incoming external input vector sequence, (ii) one adaptive weight matrix multiplied by the previous memory/state vector, and (iii) one adaptive bias vector. In effect, they augment the simple Recurrent Neural Networks (sRNNs) structure with the addition of a "memory cell" and the incorporation of at most 3 gating signals. The standard LSTM structure and components encompass redundancy and overly increased parameterization. In this paper, we systemically introduce variants of the LSTM RNNs, referred to as SLIM LSTMs. These variants express aggressively reduced parameterizations to achieve computational saving and/or speedup in (training) performance---while necessarily retaining (validation accuracy) performance comparable to the standard LSTM RNN.
NEJul 14, 2017
Simplified Long Short-term Memory Recurrent Neural Networks: part IIIAtra Akandeh, Fathi M. Salem
This is part III of three-part work. In parts I and II, we have presented eight variants for simplified Long Short Term Memory (LSTM) recurrent neural networks (RNNs). It is noted that fast computation, specially in constrained computing resources, are an important factor in processing big time-sequence data. In this part III paper, we present and evaluate two new LSTM model variants which dramatically reduce the computational load while retaining comparable performance to the base (standard) LSTM RNNs. In these new variants, we impose (Hadamard) pointwise state multiplications in the cell-memory network in addition to the gating signal networks.
NEJul 14, 2017
Simplified Long Short-term Memory Recurrent Neural Networks: part IIAtra Akandeh, Fathi M. Salem
This is part II of three-part work. Here, we present a second set of inter-related five variants of simplified Long Short-term Memory (LSTM) recurrent neural networks by further reducing adaptive parameters. Two of these models have been introduced in part I of this work. We evaluate and verify our model variants on the benchmark MNIST dataset and assert that these models are comparable to the base LSTM model while use progressively less number of parameters. Moreover, we observe that in case of using the ReLU activation, the test accuracy performance of the standard LSTM will drop after a number of epochs when learning parameter become larger. However all of the new model variants sustain their performance.
NEJul 14, 2017
Simplified Long Short-term Memory Recurrent Neural Networks: part IAtra Akandeh, Fathi M. Salem
We present five variants of the standard Long Short-term Memory (LSTM) recurrent neural networks by uniformly reducing blocks of adaptive parameters in the gating mechanisms. For simplicity, we refer to these models as LSTM1, LSTM2, LSTM3, LSTM4, and LSTM5, respectively. Such parameter-reduced variants enable speeding up data training computations and would be more suitable for implementations onto constrained embedded platforms. We comparatively evaluate and verify our five variant models on the classical MNIST dataset and demonstrate that these variant models are comparable to a standard implementation of the LSTM model while using less number of parameters. Moreover, we observe that in some cases the standard LSTM's accuracy performance will drop after a number of epochs when using the ReLU nonlinearity; in contrast, however, LSTM3, LSTM4 and LSTM5 will retain their performance.
NEJan 20, 2017
Gate-Variants of Gated Recurrent Unit (GRU) Neural NetworksRahul Dey, Fathi M. Salem
The paper evaluates three variants of the Gated Recurrent Unit (GRU) in recurrent neural networks (RNN) by reducing parameters in the update and reset gates. We evaluate the three variant GRU models on MNIST and IMDB datasets and show that these GRU-RNN variant models perform as well as the original GRU RNN model while reducing the computational expense.
NEJan 12, 2017
Simplified Minimal Gated Unit Variations for Recurrent Neural NetworksJoel Heck, Fathi M. Salem
Recurrent neural networks with various types of hidden units have been used to solve a diverse range of problems involving sequence data. Two of the most recent proposals, gated recurrent units (GRU) and minimal gated units (MGU), have shown comparable promising results on example public datasets. In this paper, we introduce three model variants of the minimal gated unit (MGU) which further simplify that design by reducing the number of parameters in the forget-gate dynamic equation. These three model variants, referred to simply as MGU1, MGU2, and MGU3, were tested on sequences generated from the MNIST dataset and from the Reuters Newswire Topics (RNT) dataset. The new models have shown similar accuracy to the MGU model while using fewer parameters and thus lowering training expense. One model variant, namely MGU2, performed better than MGU on the datasets considered, and thus may be used as an alternate to MGU or GRU in recurrent neural networks.
NEJan 12, 2017
Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural NetworksYuzhen Lu, Fathi M. Salem
The standard LSTM recurrent neural networks while very powerful in long-range dependency sequence applications have highly complex structure and relatively large (adaptive) parameters. In this work, we present empirical comparison between the standard LSTM recurrent neural network architecture and three new parameter-reduced variants obtained by eliminating combinations of the input signal, bias, and hidden unit signals from individual gating signals. The experiments on two sequence datasets show that the three new variants, called simply as LSTM1, LSTM2, and LSTM3, can achieve comparable performance to the standard LSTM model with less (adaptive) parameters.
NEDec 29, 2016
A Basic Recurrent Neural Network ModelFathi M. Salem
We present a model of a basic recurrent neural network (or bRNN) that includes a separate linear term with a slightly "stable" fixed matrix to guarantee bounded solutions and fast dynamic response. We formulate a state space viewpoint and adapt the constrained optimization Lagrange Multiplier (CLM) technique and the vector Calculus of Variations (CoV) to derive the (stochastic) gradient descent. In this process, one avoids the commonly used re-application of the circular chain-rule and identifies the error back-propagation with the co-state backward dynamic equations. We assert that this bRNN can successfully perform regression tracking of time-series. Moreover, the "vanishing and exploding" gradients are explicitly quantified and explained through the co-state dynamics and the update laws. The adapted CoV framework, in addition, can correctly and principally integrate new loss functions in the network on any variable and for varied goals, e.g., for supervised learning on the outputs and unsupervised learning on the internal (hidden) states.
SDApr 16, 2016
Two Pairwise Iterative Schemes For High Dimensional Blind Source SeparationZaid Albataineh, Fathi M. Salem
This paper addresses the high dimensionality problem in blind source separation (BSS), where the number of sources is greater than two. Two pairwise iterative schemes are proposed to tackle this high dimensionality problem. The two pairwise schemes realize nonparametric independent component analysis (ICA) algorithms based on a new high-performance Convex CauchySchwarz Divergence (CCSDIV). These two schemes enable fast and efficient demixing of sources in real-world high dimensional source applications. Finally, the performance superiority of the proposed schemes is demonstrated in metric-comparison with FastICA, RobustICA, convex ICA (CICA), and other leading existing algorithms.
ITAug 1, 2014
A Blind Adaptive CDMA Receiver Based on State Space StructuresZaid Albataineh, Fathi M. Salem
Code Division Multiple Access (CDMA) is a channel access method, based on spread-spectrum technology, used by various radio technologies world-wide. In general, CDMA is used as an access method in many mobile standards such as CDMA2000 and WCDMA. We address the problem of blind multiuser equalization in the wideband CDMA system, in the noisy multipath propagation environment. Herein, we propose three new blind receiver schemes, which are based on state space structures and Independent Component Analysis (ICA). These blind state-space receivers (BSSR) do not require knowledge of the propagation parameters or spreading code sequences of the users they primarily exploit the natural assumption of statistical independence among the source signals. We also develop three semi blind adaptive detectors by incorporating the new adaptive methods into the standard RAKE receiver structure. Extensive comparative case study, based on Bit error rate (BER) performance of these methods, is carried out for different number of users, symbols per user, and signal to noise ratio (SNR) in comparison with conventional detectors, including the Blind Multiuser Detectors (BMUD) and Linear Minimum mean squared error (LMMSE). The results show that the proposed methods outperform the other detectors in estimating the symbol signals from the received mixed CDMA signals. Moreover, the new blind detectors mitigate the multi access interference (MAI) in CDMA.
LGAug 1, 2014
A RobustICA Based Algorithm for Blind Separation of Convolutive MixturesZaid Albataineh, Fathi M. Salem
We propose a frequency domain method based on robust independent component analysis (RICA) to address the multichannel Blind Source Separation (BSS) problem of convolutive speech mixtures in highly reverberant environments. We impose regularization processes to tackle the ill-conditioning problem of the covariance matrix and to mitigate the performance degradation in the frequency domain. We apply an algorithm to separate the source signals in adverse conditions, i.e. high reverberation conditions when short observation signals are available. Furthermore, we study the impact of several parameters on the performance of separation, e.g. overlapping ratio and window type of the frequency domain method. We also compare different techniques to solve the frequency-domain permutation ambiguity. Through simulations and real world experiments, we verify the superiority of the presented convolutive algorithm among other BSS algorithms, including recursive regularized ICA (RR ICA), independent vector analysis (IVA).