Anthony S. Maida

CV
h-index2
14papers
1,768citations
Novelty44%
AI Score38

14 Papers

CVJan 11, 2023
Enhancing ResNet Image Classification Performance by using Parameterized Hypercomplex Multiplication

Nazmul Shahadat, Anthony S. Maida

Recently, many deep networks have introduced hypercomplex and related calculations into their architectures. In regard to convolutional networks for classification, these enhancements have been applied to the convolution operations in the frontend to enhance accuracy and/or reduce the parameter requirements while maintaining accuracy. Although these enhancements have been applied to the convolutional frontend, it has not been studied whether adding hypercomplex calculations improves performance when applied to the densely connected backend. This paper studies ResNet architectures and incorporates parameterized hypercomplex multiplication (PHM) into the backend of residual, quaternion, and vectormap convolutional neural networks to assess the effect. We show that PHM does improve classification accuracy performance on several image datasets, including small, low-resolution CIFAR 10/100 and large high-resolution ImageNet and ASL, and can achieve state-of-the-art accuracy for hypercomplex networks.

CVJan 11, 2023
Deep Residual Axial Networks

Nazmul Shahadat, Anthony S. Maida

While convolutional neural networks (CNNs) demonstrate outstanding performance on computer vision tasks, their computational costs remain high. Several techniques are used to reduce these costs, like reducing channel count, and using separable and depthwise separable convolutions. This paper reduces computational costs by introducing a novel architecture, axial CNNs, which replaces spatial 2D convolution operations with two consecutive depthwise separable 1D operations. The axial CNNs are predicated on the assumption that the dataset supports approximately separable convolution operations with little or no loss of training accuracy. Deep axial separable CNNs still suffer from gradient problems when training deep networks. We modify the construction of axial separable CNNs with residual connections to improve the performance of deep axial architectures and introduce our final novel architecture namely residual axial networks (RANs). Extensive benchmark evaluation shows that RANs achieve at least 1% higher performance with about 77%, 86%, 75%, and 34% fewer parameters and about 75%, 80%, 67%, and 26% fewer flops than ResNets, wide ResNets, MobileNets, and SqueezeNexts on CIFAR benchmarks, SVHN, and Tiny ImageNet image classification datasets. Moreover, our proposed RANs improve deep recursive residual networks performance with 94% fewer parameters on the image super-resolution dataset.

CVApr 8, 2022
Vision-Based American Sign Language Classification Approach via Deep Learning

Nelly Elsayed, Zag ElSayed, Anthony S. Maida

Hearing-impaired is the disability of partial or total hearing loss that causes a significant problem for communication with other people in society. American Sign Language (ASL) is one of the sign languages that most commonly used language used by Hearing impaired communities to communicate with each other. In this paper, we proposed a simple deep learning model that aims to classify the American Sign Language letters as a step in a path for removing communication barriers that are related to disabilities.

LGJan 12, 2023
LiteLSTM Architecture Based on Weights Sharing for Recurrent Neural Networks

Nelly Elsayed, Zag ElSayed, Anthony S. Maida

Long short-term memory (LSTM) is one of the robust recurrent neural network architectures for learning sequential data. However, it requires considerable computational power to learn and implement both software and hardware aspects. This paper proposed a novel LiteLSTM architecture based on reducing the LSTM computation components via the weights sharing concept to reduce the overall architecture computation cost and maintain the architecture performance. The proposed LiteLSTM can be significant for processing large data where time-consuming is crucial while hardware resources are limited, such as the security of IoT devices and medical data processing. The proposed model was evaluated and tested empirically on three different datasets from the computer vision, cybersecurity, speech emotion recognition domains. The proposed LiteLSTM has comparable accuracy to the other state-of-the-art recurrent architecture while using a smaller computation budget.

CVJan 11, 2023
Deep Axial Hypercomplex Networks

Nazmul Shahadat, Anthony S. Maida

Over the past decade, deep hypercomplex-inspired networks have enhanced feature extraction for image classification by enabling weight sharing across input channels. Recent works make it possible to improve representational capabilities by using hypercomplex-inspired networks which consume high computational costs. This paper reduces this cost by factorizing a quaternion 2D convolutional module into two consecutive vectormap 1D convolutional modules. Also, we use 5D parameterized hypercomplex multiplication based fully connected layers. Incorporating both yields our proposed hypercomplex network, a novel architecture that can be assembled to construct deep axial-hypercomplex networks (DANs) for image classifications. We conduct experiments on CIFAR benchmarks, SVHN, and Tiny ImageNet datasets and achieve better performance with fewer trainable parameters and FLOPS. Our proposed model achieves almost 2% higher performance for CIFAR and SVHN datasets, and more than 3% for the ImageNet-Tiny dataset and takes six times fewer parameters than the real-valued ResNets. Also, it shows state-of-the-art performance on CIFAR benchmarks in hypercomplex space.

CVOct 23, 2025
Physics-Guided Fusion for Robust 3D Tracking of Fast Moving Small Objects

Prithvi Raj Singh, Raju Gottumukkala, Anthony S. Maida et al.

While computer vision has advanced considerably for general object detection and tracking, the specific problem of fast-moving tiny objects remains underexplored. This paper addresses the significant challenge of detecting and tracking rapidly moving small objects using an RGB-D camera. Our novel system combines deep learning-based detection with physics-based tracking to overcome the limitations of existing approaches. Our contributions include: (1) a comprehensive system design for object detection and tracking of fast-moving small objects in 3D space, (2) an innovative physics-based tracking algorithm that integrates kinematics motion equations to handle outliers and missed detections, and (3) an outlier detection and correction module that significantly improves tracking performance in challenging scenarios such as occlusions and rapid direction changes. We evaluated our proposed system on a custom racquetball dataset. Our evaluation shows our system surpassing kalman filter based trackers with up to 70\% less Average Displacement Error. Our system has significant applications for improving robot perception on autonomous platforms and demonstrates the effectiveness of combining physics-based models with deep learning approaches for real-time 3D detection and tracking of challenging small objects.

LGJan 27, 2022
LiteLSTM Architecture for Deep Recurrent Neural Networks

Nelly Elsayed, Zag ElSayed, Anthony S. Maida

Long short-term memory (LSTM) is a robust recurrent neural network architecture for learning spatiotemporal sequential data. However, it requires significant computational power for learning and implementing from both software and hardware aspects. This paper proposes a novel LiteLSTM architecture based on reducing the computation components of the LSTM using the weights sharing concept to reduce the overall architecture cost and maintain the architecture performance. The proposed LiteLSTM can be significant for learning big data where time-consumption is crucial such as the security of IoT devices and medical data. Moreover, it helps to reduce the CO2 footprint. The proposed model was evaluated and tested empirically on two different datasets from computer vision and cybersecurity domains.

CVOct 4, 2021
Improving Axial-Attention Network Classification via Cross-Channel Weight Sharing

Nazmul Shahadat, Anthony S. Maida

In recent years, hypercomplex-inspired neural networks (HCNNs) have been used to improve deep learning architectures due to their ability to enable channel-based weight sharing, treat colors as a single entity, and improve representational coherence within the layers. The work described herein studies the effect of replacing existing layers in an Axial Attention network with their representationally coherent variants to assess the effect on image classification. We experiment with the stem of the network, the bottleneck layers, and the fully connected backend, by replacing them with representationally coherent variants. These various modifications lead to novel architectures which all yield improved accuracy performance on the ImageNet300k classification dataset. Our baseline networks for comparison were the original real-valued ResNet, the original quaternion-valued ResNet, and the Axial Attention ResNet. Since improvement was observed regardless of which part of the network was modified, there is a promise that this technique may be generally useful in improving classification accuracy for a large class of networks.

CVAug 28, 2019
Inception-inspired LSTM for Next-frame Video Prediction

Matin Hosseini, Anthony S. Maida, Majid Hosseini et al.

The problem of video frame prediction has received much interest due to its relevance to many computer vision applications such as autonomous vehicles or robotics. Supervised methods for video frame prediction rely on labeled data, which may not always be available. In this paper, we provide a novel unsupervised deep-learning method called Inception-based LSTM for video frame prediction. The general idea of inception networks is to implement wider networks instead of deeper networks. This network design was shown to improve the performance of image classification. The proposed method is evaluated on both Inception-v1 and Inception-v2 structures. The proposed Inception LSTM methods are compared with convolutional LSTM when applied using PredNet predictive coding framework for both the KITTI and KTH data sets. We observed that the Inception based LSTM outperforms the convolutional LSTM. Also, Inception LSTM has better prediction performance compared to Inception v2 LSTM. However, Inception v2 LSTM has a lower computational cost compared to Inception LSTM.

LGDec 18, 2018
Deep Gated Recurrent and Convolutional Network Hybrid Model for Univariate Time Series Classification

Nelly Elsayed, Anthony S. Maida, Magdy Bayoumi

Hybrid LSTM-fully convolutional networks (LSTM-FCN) for time series classification have produced state-of-the-art classification results on univariate time series. We show that replacing the LSTM with a gated recurrent unit (GRU) to create a GRU-fully convolutional network hybrid model (GRU-FCN) can offer even better performance on many time series datasets. The proposed GRU-FCN model outperforms state-of-the-art classification performance in many univariate and multivariate time series datasets. In addition, since the GRU uses a simpler architecture than the LSTM, it has fewer training parameters, less training time, and a simpler hardware implementation, compared to the LSTM-based models.

LGOct 16, 2018
Reduced-Gate Convolutional LSTM Using Predictive Coding for Spatiotemporal Prediction

Nelly Elsayed, Anthony S. Maida, Magdy Bayoumi

Spatiotemporal sequence prediction is an important problem in deep learning. We study next-frame(s) video prediction using a deep-learning-based predictive coding framework that uses convolutional, long short-term memory (convLSTM) modules. We introduce a novel reduced-gate convolutional LSTM(rgcLSTM) architecture that requires a significantly lower parameter budget than a comparable convLSTM. By using a single multi-function gate, our reduced-gate model achieves equal or better next-frame(s) prediction accuracy than the original convolutional LSTM while using a smaller parameter budget, thereby reducing training time and memory requirements. We tested our reduced gate modules within a predictive coding architecture on the moving MNIST and KITTI datasets. We found that our reduced-gate model has a significant reduction of approximately 40 percent of the total number of training parameters and a 25 percent reduction in elapsed training time in comparison with the standard convolutional LSTM model. The performance accuracy of the new model was also improved. This makes our model more attractive for hardware implementation, especially on small devices. We also explored a space of twenty different gated architectures to get insight into how our rgcLSTM fit into that space.

NEApr 22, 2018
Deep Learning in Spiking Neural Networks

Amirhossein Tavanaei, Masoud Ghodrati, Saeed Reza Kheradpisheh et al.

In recent years, deep learning has been a revolution in the field of machine learning, for computer vision in particular. In this approach, a deep (multilayer) artificial neural network (ANN) is trained in a supervised manner using backpropagation. Huge amounts of labeled examples are required, but the resulting classification accuracy is truly impressive, sometimes outperforming humans. Neurons in an ANN are characterized by a single, static, continuous-valued activation. Yet biological neurons use discrete spikes to compute and transmit information, and the spike times, in addition to the spike rates, matter. Spiking neural networks (SNNs) are thus more biologically realistic than ANNs, and arguably the only viable option if one wants to understand how the brain computes. SNNs are also more hardware friendly and energy-efficient than ANNs, and are thus appealing for technology, especially for portable devices. However, training deep SNNs remains a challenge. Spiking neurons' transfer function is usually non-differentiable, which prevents using backpropagation. Here we review recent supervised and unsupervised methods to train deep SNNs, and compare them in terms of accuracy, but also computational cost and hardware friendliness. The emerging picture is that SNNs still lag behind ANNs in terms of accuracy, but the gap is decreasing, and can even vanish on some tasks, while the SNNs typically require much fewer operations.

NENov 12, 2017
BP-STDP: Approximating Backpropagation using Spike Timing Dependent Plasticity

Amirhossein Tavanaei, Anthony S. Maida

The problem of training spiking neural networks (SNNs) is a necessary precondition to understanding computations within the brain, a field still in its infancy. Previous work has shown that supervised learning in multi-layer SNNs enables bio-inspired networks to recognize patterns of stimuli through hierarchical feature acquisition. Although gradient descent has shown impressive performance in multi-layer (and deep) SNNs, it is generally not considered biologically plausible and is also computationally expensive. This paper proposes a novel supervised learning approach based on an event-based spike-timing-dependent plasticity (STDP) rule embedded in a network of integrate-and-fire (IF) neurons. The proposed temporally local learning rule follows the backpropagation weight change updates applied at each time step. This approach enjoys benefits of both accurate gradient descent and temporally local, efficient STDP. Thus, this method is able to address some open questions regarding accurate and efficient computations that occur in the brain. The experimental results on the XOR problem, the Iris data, and the MNIST dataset demonstrate that the proposed SNN performs as successfully as the traditional NNs. Our approach also compares favorably with the state-of-the-art multi-layer SNNs.

NENov 9, 2016
Bio-Inspired Spiking Convolutional Neural Network using Layer-wise Sparse Coding and STDP Learning

Amirhossein Tavanaei, Anthony S. Maida

Hierarchical feature discovery using non-spiking convolutional neural networks (CNNs) has attracted much recent interest in machine learning and computer vision. However, it is still not well understood how to create a biologically plausible network of brain-like, spiking neurons with multi-layer, unsupervised learning. This paper explores a novel bio-inspired spiking CNN that is trained in a greedy, layer-wise fashion. The proposed network consists of a spiking convolutional-pooling layer followed by a feature discovery layer extracting independent visual features. Kernels for the convolutional layer are trained using local learning. The learning is implemented using a sparse, spiking auto-encoder representing primary visual features. The feature discovery layer extracts independent features by probabilistic, leaky integrate-and-fire (LIF) neurons that are sparsely active in response to stimuli. The layer of the probabilistic, LIF neurons implicitly provides lateral inhibitions to extract sparse and independent features. Experimental results show that the convolutional layer is stack-admissible, enabling it to support a multi-layer learning. The visual features obtained from the proposed probabilistic LIF neurons in the feature discovery layer are utilized for training a classifier. Classification results contribute to the independent and informative visual features extracted in a hierarchy of convolutional and feature discovery layers. The proposed model is evaluated on the MNIST digit dataset using clean and noisy images. The recognition performance for clean images is above 98%. The performance loss for recognizing the noisy images is in the range 0.1% to 8.5% depending on noise types and densities. This level of performance loss indicates that the network is robust to additive noise.