Syed Mohammed Shamsul Islam

h-index24

12papers

321citations

Novelty34%

AI Score39

Ranked #82,502 of 194,257 authors (top 42%)#27,837 in CV (top 47%)

12 Papers

5.9CVAug 1, 2023

MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers

Muhammad Bilal Shaikh, Douglas Chai, Syed Mohammed Shamsul Islam et al.

In line with the human capacity to perceive the world by simultaneously processing and integrating high-dimensional inputs from multiple modalities like vision and audio, we propose a novel model, MAiVAR-T (Multimodal Audio-Image to Video Action Recognition Transformer). This model employs an intuitive approach for the combination of audio-image and video modalities, with a primary aim to escalate the effectiveness of multimodal human action recognition (MHAR). At the core of MAiVAR-T lies the significance of distilling substantial representations from the audio modality and transmuting these into the image domain. Subsequently, this audio-image depiction is fused with the video modality to formulate a unified representation. This concerted approach strives to exploit the contextual richness inherent in both audio and video modalities, thereby promoting action recognition. In contrast to existing state-of-the-art strategies that focus solely on audio or video modalities, MAiVAR-T demonstrates superior performance. Our extensive empirical evaluations conducted on a benchmark action recognition dataset corroborate the model's remarkable performance. This underscores the potential enhancements derived from integrating audio and video modalities for action recognition purposes.

4.8CVSep 11, 2022

MAiVAR: Multimodal Audio-Image and Video Action Recognizer

Muhammad Bilal Shaikh, Douglas Chai, Syed Mohammed Shamsul Islam et al.

Currently, action recognition is predominately performed on video data as processed by CNNs. We investigate if the representation process of CNNs can also be leveraged for multimodal action recognition by incorporating image-based audio representations of actions in a task. To this end, we propose Multimodal Audio-Image and Video Action Recognizer (MAiVAR), a CNN-based audio-image to video fusion model that accounts for video and audio modalities to achieve superior action recognition performance. MAiVAR extracts meaningful image representations of audio and fuses it with video representation to achieve better performance as compared to both modalities individually on a large-scale action recognition dataset.

3.0IVJan 21, 2023

Pre-text Representation Transfer for Deep Learning with Limited Imbalanced Data : Application to CT-based COVID-19 Detection

Fouzia Altaf, Syed M. S. Islam, Naeem K. Janjua et al.

Annotating medical images for disease detection is often tedious and expensive. Moreover, the available training samples for a given task are generally scarce and imbalanced. These conditions are not conducive for learning effective deep neural models. Hence, it is common to 'transfer' neural networks trained on natural images to the medical image domain. However, this paradigm lacks in performance due to the large domain gap between the natural and medical image data. To address that, we propose a novel concept of Pre-text Representation Transfer (PRT). In contrast to the conventional transfer learning, which fine-tunes a source model after replacing its classification layers, PRT retains the original classification layers and updates the representation layers through an unsupervised pre-text task. The task is performed with (original, not synthetic) medical images, without utilizing any annotations. This enables representation transfer with a large amount of training data. This high-fidelity representation transfer allows us to use the resulting model as a more effective feature extractor. Moreover, we can also subsequently perform the traditional transfer learning with this model. We devise a collaborative representation based classification layer for the case when we leverage the model as a feature extractor. We fuse the output of this layer with the predictions of a model induced with the traditional transfer learning performed over our pre-text transferred model. The utility of our technique for limited and imbalanced data classification problem is demonstrated with an extensive five-fold evaluation for three large-scale models, tested for five different class-imbalance ratios for CT based COVID-19 detection. Our results show a consistent gain over the conventional transfer learning with the proposed method.

5.7CRMay 6

Assessing Generalisation Capability of Machine Learning Models for Intrusion Detection

Md Zakir Hossain, Md Ayshik Rahman Khan, Md Rafiqul Islam et al.

The growth of networked and IoT systems has intensified cyber-security threats and exposed the limits of traditional signature-based intrusion detection. Although machine-learning-based intrusion detection systems often report strong benchmark performance, high ac- curacy within a single dataset does not necessarily guarantee reliable performance in unseen network environments. This study investigates the generalisation capability of supervised machine learning models for intrusion detection using UNSW-NB15 and TON_IoT. Random Forest, Logistic Regression, and Naive Bayes were evaluated under same-dataset and cross-dataset settings. Random Forest achieved the strongest same dataset performance, with 95.08% accuracy on UNSW-NB15 and 99.79% on TON_IoT, but performance dropped sharply in cross-dataset testing. When trained on UNSW-NB15 and tested on TON_IoT or vice versa, below 40% accuracy. These results reveal a significant generalisation gap in intrusion detection. We connect this challenge to affective computing and human-centric AI, where behavioural signal analysis, anomaly detection, domain shift, and context-sensitive modelling are also central. This framing highlights the need for adaptive, generalisable cyber-security models that can operate across changing network and IoT environments.

2.3CVSep 21, 2020Code

Exploring Intensity Invariance in Deep Neural Networks for Brain Image Registration

Hassan Mahmood, Asim Iqbal, Syed Mohammed Shamsul Islam

Image registration is a widely-used technique in analysing large scale datasets that are captured through various imaging modalities and techniques in biomedical imaging such as MRI, X-Rays, etc. These datasets are typically collected from various sites and under different imaging protocols using a variety of scanners. Such heterogeneity in the data collection process causes inhomogeneity or variation in intensity (brightness) and noise distribution. These variations play a detrimental role in the performance of image registration, segmentation and detection algorithms. Classical image registration methods are computationally expensive but are able to handle these artifacts relatively better. However, deep learning-based techniques are shown to be computationally efficient for automated brain registration but are sensitive to the intensity variations. In this study, we investigate the effect of variation in intensity distribution among input image pairs for deep learning-based image registration methods. We find a performance degradation of these models when brain image pairs with different intensity distribution are presented even with similar structures. To overcome this limitation, we incorporate a structural similarity-based loss function in a deep neural network and test its performance on the validation split separated before training as well as on a completely unseen new dataset. We report that the deep learning models trained with structure similarity-based loss seems to perform better for both datasets. This investigation highlights a possible performance limiting factor in deep learning-based registration models and suggests a potential solution to incorporate the intensity distribution variation in the input image pairs. Our code and models are available at https://github.com/hassaanmahmood/DeepIntense.

10.5CVMay 22, 2024

From CNNs to Transformers in Multimodal Human Action Recognition: A Survey

Muhammad Bilal Shaikh, Syed Mohammed Shamsul Islam, Douglas Chai et al.

Due to its widespread applications, human action recognition is one of the most widely studied research problems in Computer Vision. Recent studies have shown that addressing it using multimodal data leads to superior performance as compared to relying on a single data modality. During the adoption of deep learning for visual modelling in the last decade, action recognition approaches have mainly relied on Convolutional Neural Networks (CNNs). However, the recent rise of Transformers in visual modelling is now also causing a paradigm shift for the action recognition task. This survey captures this transition while focusing on Multimodal Human Action Recognition (MHAR). Unique to the induction of multimodal computational models is the process of "fusing" the features of the individual data modalities. Hence, we specifically focus on the fusion design aspects of the MHAR approaches. We analyze the classic and emerging techniques in this regard, while also highlighting the popular trends in the adaption of CNN and Transformer building blocks for the overall problem. In particular, we emphasize on recent design choices that have led to more efficient MHAR models. Unlike existing reviews, which discuss Human Action Recognition from a broad perspective, this survey is specifically aimed at pushing the boundaries of MHAR research by identifying promising architectural and fusion design choices to train practicable models. We also provide an outlook of the multimodal datasets from their scale and evaluation viewpoint. Finally, building on the reviewed literature, we discuss the challenges and future avenues for MHAR.

1.4CVFeb 21, 2022

Fast Semantic-Assisted Outlier Removal for Large-scale Point Cloud Registration

Giang Truong, Huu Le, Alvaro Parra et al.

With current trends in sensors (cheaper, more volume of data) and applications (increasing affordability for new tasks, new ideas in what 3D data could be useful for); there is corresponding increasing interest in the ability to automatically, reliably, and cheaply, register together individual point clouds. The volume of data to handle, and still elusive need to have the registration occur fully reliably and fully automatically, mean there is a need to innovate further. One largely untapped area of innovation is that of exploiting the {\em semantic information} of the points in question. Points on a tree should match points on a tree, for example, and not points on car. Moreover, such a natural restriction is clearly human-like - a human would generally quickly eliminate candidate regions for matching based on semantics. Employing semantic information is not only efficient but natural. It is also timely - due to the recent advances in semantic classification capabilities. This paper advances this theme by demonstrating that state of the art registration techniques, in particular ones that rely on "preservation of length under rigid motion" as an underlying matching consistency constraint, can be augmented with semantic information. Semantic identity is of course also preserved under rigid-motion, but also under wider motions present in a scene. We demonstrate that not only the potential obstacle of cost of semantic segmentation, and the potential obstacle of the unreliability of semantic segmentation; are both no impediment to achieving both speed and accuracy in fully automatic registration of large scale point clouds.

2.6CVAug 12, 2021

Resetting the baseline: CT-based COVID-19 diagnosis with Deep Transfer Learning is not as accurate as widely thought

Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar

Deep learning is gaining instant popularity in computer aided diagnosis of COVID-19. Due to the high sensitivity of Computed Tomography (CT) to this disease, CT-based COVID-19 detection with visual models is currently at the forefront of medical imaging research. Outcomes published in this direction are frequently claiming highly accurate detection under deep transfer learning. This is leading medical technologists to believe that deep transfer learning is the mainstream solution for the problem. However, our critical analysis of the literature reveals an alarming performance disparity between different published results. Hence, we conduct a systematic thorough investigation to analyze the effectiveness of deep transfer learning for COVID-19 detection with CT images. Exploring 14 state-of-the-art visual models with over 200 model training sessions, we conclusively establish that the published literature is frequently overestimating transfer learning performance for the problem, even in the prestigious scientific sources. The roots of overestimation trace back to inappropriate data curation. We also provide case studies that consider more realistic scenarios, and establish transparent baselines for the problem. We hope that our reproducible investigation will help in curbing hype-driven claims for the critical problem of COVID-19 diagnosis, and pave the way for a more transparent performance evaluation of techniques for CT-based COVID-19 detection.

4.7CVFeb 16, 2021

Boosting Deep Transfer Learning for COVID-19 Classification

Fouzia Altaf, Syed M. S. Islam, Naeem K. Janjua et al.

COVID-19 classification using chest Computed Tomography (CT) has been found pragmatically useful by several studies. Due to the lack of annotated samples, these studies recommend transfer learning and explore the choices of pre-trained models and data augmentation. However, it is still unknown if there are better strategies than vanilla transfer learning for more accurate COVID-19 classification with limited CT data. This paper provides an affirmative answer, devising a novel `model' augmentation technique that allows a considerable performance boost to transfer learning for the task. Our method systematically reduces the distributional shift between the source and target domains and considers augmenting deep learning with complementary representation learning techniques. We establish the efficacy of our method with publicly available datasets and models, along with identifying contrasting observations in the previous studies.

1.0LGFeb 25, 2019

Epileptic seizure classification using statistical sampling and a novel feature selection algorithm

Md Mursalin, Syed Shamsul Islam, Md Kislu Noman et al.

Epilepsy is a well-known neuronal disorder that can be identified by interpretation of the electroencephalogram (EEG) signal. Usually, the length of an EEG signal is quite long which is challenging to interpret manually. In this work, we propose an automated epileptic seizure detection method by applying a two-step minimization technique: first, we reduce the data points using a statistical sampling technique and then, we minimize the number of features using our novel feature selection algorithm. We then apply different machine learning algorithms for performance measurement of the proposed feature selection algorithm. The experimental results outperform some of the state-of-the-art methods for seizure detection using the reduced data points and the least number of features.

2.8SEFeb 25, 2019

A Taxonomy of Modeling Approaches for Systems-of-Systems Dynamic Architectures: Overview and Prospects

Ahmad Mohsin, Naeem Khalid Janjua, Syed MS Islam et al.

Systems-of-Systems (SoS) result from the collaboration of independent Constituent Systems (CSs) to achieve particular missions. CSs are not totally known at design time, and may also leave or join SoS at runtime, which turns the SoS architecture to be inherently dynamic, forming new architectural configurations and impacting the overall system quality attributes (i.e. performance, security and reliability). Therefore, it is vital to model and evaluate the impact of these stochastic architectural changes on SoS properties at abstract level at the early stage in order to analyze and select appropriate architectural design. Architectural description languages (ADL) have been proposed and used to deal with SoS dynamic architectures. However, we still envision gaps to be bridged and challenges to be addressed in the forthcoming years. This paper presents a broad discussion on the state-of-the-art notations to model and analyze SoS dynamic architectures. The main contribution this paper is threefold: (i) providing results of a literature review on the support of available architecture modeling approaches for SoS and an analysis of their semantic extension to support specification of SoS dynamic architectures, and (ii) a corresponding taxonomy for modeling SoS obtained as a result of the literature review. Besides, we also discuss future directions and challenges to be overcome in the forthcoming years.

13.6CVFeb 15, 2019

Going Deep in Medical Image Analysis: Concepts, Methods, Challenges and Future Directions

Fouzia Altaf, Syed M. S. Islam, Naveed Akhtar et al.

Medical Image Analysis is currently experiencing a paradigm shift due to Deep Learning. This technology has recently attracted so much interest of the Medical Imaging community that it led to a specialized conference in `Medical Imaging with Deep Learning' in the year 2018. This article surveys the recent developments in this direction, and provides a critical review of the related major aspects. We organize the reviewed literature according to the underlying Pattern Recognition tasks, and further sub-categorize it following a taxonomy based on human anatomy. This article does not assume prior knowledge of Deep Learning and makes a significant contribution in explaining the core Deep Learning concepts to the non-experts in the Medical community. Unique to this study is the Computer Vision/Machine Learning perspective taken on the advances of Deep Learning in Medical Imaging. This enables us to single out `lack of appropriately annotated large-scale datasets' as the core challenge (among other challenges) in this research direction. We draw on the insights from the sister research fields of Computer Vision, Pattern Recognition and Machine Learning etc.; where the techniques of dealing with such challenges have already matured, to provide promising directions for the Medical Imaging community to fully harness Deep Learning in the future.