QUANT-PHJul 20, 2023
Quantum Convolutional Neural Networks with Interaction Layers for Classification of Classical DataJishnu Mahmud, Raisa Mashtura, Shaikh Anowarul Fattah et al.
Quantum Machine Learning (QML) has come into the limelight due to the exceptional computational abilities of quantum computers. With the promises of near error-free quantum computers in the not-so-distant future, it is important that the effect of multi-qubit interactions on quantum neural networks is studied extensively. This paper introduces a Quantum Convolutional Network with novel Interaction layers exploiting three-qubit interactions, while studying the network's expressibility and entangling capability, for classifying both image and one-dimensional data. The proposed approach is tested on three publicly available datasets namely MNIST, Fashion MNIST, and Iris datasets, flexible in performing binary and multiclass classifications, and is found to supersede the performance of existing state-of-the-art methods.
CVOct 3, 2023
Decoding Human Activities: Analyzing Wearable Accelerometer and Gyroscope Data for Activity RecognitionUtsab Saha, Sawradip Saha, Tahmid Kabir et al.
A person's movement or relative positioning can be effectively captured by different types of sensors and corresponding sensor output can be utilized in various manipulative techniques for the classification of different human activities. This letter proposes an effective scheme for human activity recognition, which introduces two unique approaches within a multi-structural architecture, named FusionActNet. The first approach aims to capture the static and dynamic behavior of a particular action by using two dedicated residual networks and the second approach facilitates the final decision-making process by introducing a guidance module. A two-stage training process is designed where at the first stage, residual networks are pre-trained separately by using static (where the human body is immobile) and dynamic (involving movement of the human body) data. In the next stage, the guidance module along with the pre-trained static or dynamic models are used to train the given sensor data. Here the guidance module learns to emphasize the most relevant prediction vector obtained from the static or dynamic models, which helps to effectively classify different human activities. The proposed scheme is evaluated using two benchmark datasets and compared with state-of-the-art methods. The results clearly demonstrate that our method outperforms existing approaches in terms of accuracy, precision, recall, and F1 score, achieving 97.35% and 95.35% accuracy on the UCI HAR and Motion-Sense datasets, respectively which highlights both the effectiveness and stability of the proposed scheme.
SDSep 15, 2023
Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown Multi-Class Ensemble of CNNsMd Awsafur Rahman, Bishmoy Paul, Najibul Haque Sarker et al.
With the huge technological advances introduced by deep learning in audio & speech processing, many novel synthetic speech techniques achieved incredible realistic results. As these methods generate realistic fake human voices, they can be used in malicious acts such as people imitation, fake news, spreading, spoofing, media manipulations, etc. Hence, the ability to detect synthetic or natural speech has become an urgent necessity. Moreover, being able to tell which algorithm has been used to generate a synthetic speech track can be of preeminent importance to track down the culprit. In this paper, a novel strategy is proposed to attribute a synthetic speech track to the generator that is used to synthesize it. The proposed detector transforms the audio into log-mel spectrogram, extracts features using CNN, and classifies it between five known and unknown algorithms, utilizing semi-supervision and ensemble to improve its robustness and generalizability significantly. The proposed detector is validated on two evaluation datasets consisting of a total of 18,000 weakly perturbed (Eval 1) & 10,000 strongly perturbed (Eval 2) synthetic speeches. The proposed method outperforms other top teams in accuracy by 12-13% on Eval 2 and 1-2% on Eval 1, in the IEEE SP Cup challenge at ICASSP 2022.
CVJun 5, 2024
Npix2Cpix: A GAN-Based Image-to-Image Translation Network With Retrieval- Classification Integration for Watermark Retrieval From Historical Document ImagesUtsab Saha, Sawradip Saha, Shaikh Anowarul Fattah et al.
The identification and restoration of ancient watermarks have long been a major topic in codicology and history. Classifying historical documents based on watermarks is challenging due to their diversity, noisy samples, multiple representation modes, and minor distinctions between classes and intra-class variations. This paper proposes a modified U-net-based conditional generative adversarial network (GAN) named Npix2Cpix to translate noisy raw historical watermarked images into clean, handwriting-free watermarked images by performing image translation from degraded (noisy) pixels to clean pixels. Using image-to-image translation and adversarial learning, the network creates clutter-free images for watermark restoration and categorization. The generator and discriminator of the proposed GAN are trained using two separate loss functions, each based on the distance between images, to learn the mapping from the input noisy image to the output clean image. After using the proposed GAN to pre-process noisy watermarked images, Siamese-based one-shot learning is employed for watermark classification. Experimental results on a large-scale historical watermark dataset demonstrate that cleaning the noisy watermarked images can help to achieve high one-shot classification accuracy. The qualitative and quantitative evaluation of the retrieved watermarked image highlights the effectiveness of the proposed approach.
IVJan 3, 2021
CovTANet: A Hybrid Tri-level Attention Based Network for Lesion Segmentation, Diagnosis, and Severity Prediction of COVID-19 Chest CT ScansTanvir Mahmud, Md. Jahin Alam, Sakib Chowdhury et al.
Rapid and precise diagnosis of COVID-19 is one of the major challenges faced by the global community to control the spread of this overgrowing pandemic. In this paper, a hybrid neural network is proposed, named CovTANet, to provide an end-to-end clinical diagnostic tool for early diagnosis, lesion segmentation, and severity prediction of COVID-19 utilizing chest computer tomography (CT) scans. A multi-phase optimization strategy is introduced for solving the challenges of complicated diagnosis at a very early stage of infection, where an efficient lesion segmentation network is optimized initially which is later integrated into a joint optimization framework for the diagnosis and severity prediction tasks providing feature enhancement of the infected regions. Moreover, for overcoming the challenges with diffused, blurred, and varying shaped edges of COVID lesions with novel and diverse characteristics, a novel segmentation network is introduced, namely Tri-level Attention-based Segmentation Network (TA-SegNet). This network has significantly reduced semantic gaps in subsequent encoding decoding stages, with immense parallelization of multi-scale features for faster convergence providing considerable performance improvement over traditional networks. Furthermore, a novel tri-level attention mechanism has been introduced, which is repeatedly utilized over the network, combining channel, spatial, and pixel attention schemes for faster and efficient generalization of contextual information embedded in the feature map through feature re-calibration and enhancement operations. Outstanding performances have been achieved in all three-tasks through extensive experimentation on a large publicly available dataset containing 1110 chest CT-volumes that signifies the effectiveness of the proposed scheme at the current stage of the pandemic.
SPJun 26, 2020
Co-Designing Statistical MIMO Radar and In-band Full-Duplex Multi-User MIMO Communications -- Part I: Signal ProcessingJiawei Liu, Kumar Vijay Mishra, Mohammad Saquib
We consider a spectral sharing problem in which a statistical (or widely distributed) multiple-input multiple-output (MIMO) radar and an in-band full-duplex (IBFD) multi-user MIMO (MU-MIMO) communications system concurrently operate within the same frequency band. Prior works on joint MIMO-radar-MIMO-communications (MRMC) systems largely focus on either colocated MIMO radars, half-duplex MIMO communications, single-user scenarios, omit practical constraints (clutter, uplink [UL]/downlink [DL] transmit powers, UL/DL quality-of-service, and peak-to-average-power ratio), or MRMC co-existence that employs separate transmit/receive units. The purpose of this and companion papers (Part II and III) is to co-design an MRMC framework that addresses all of these issues. In this paper, we propose signal processing for a distributed IBFD MRMC, where radar receiver is designed to additionally exploit the downlink communications signals reflected from a radar target. Extensive numerical experiments show that our methods improve radar target detection over conventional codes and yield a higher achievable data rate than standard precoders. The following companion paper (Part II) describes the theory and procedure of our algorithm to solve the non-convex design problem. The final companion paper (Part II) considers the case of multiple targets and examines the tracking performance of our MRMC system.