CVAug 9, 2022
Hierarchical Residual Learning Based Vector Quantized Variational Autoencoder for Image Reconstruction and GenerationMohammad Adiban, Kalin Stefanov, Sabato Marco Siniscalchi et al. · gatech
We propose a multi-layer variational autoencoder method, we call HR-VQVAE, that learns hierarchical discrete representations of the data. By utilizing a novel objective function, each layer in HR-VQVAE learns a discrete representation of the residual from previous layers through a vector quantized encoder. Furthermore, the representations at each layer are hierarchically linked to those at previous layers. We evaluate our method on the tasks of image reconstruction and generation. Experimental results demonstrate that the discrete representations learned by HR-VQVAE enable the decoder to reconstruct high-quality images with less distortion than the baseline methods, namely VQVAE and VQVAE-2. HR-VQVAE can also generate high-quality and diverse images that outperform state-of-the-art generative models, providing further verification of the efficiency of the learned representations. The hierarchical nature of HR-VQVAE i) reduces the decoding search time, making the method particularly suitable for high-load tasks and ii) allows to increase the codebook size without incurring the codebook collapse problem.
CVJul 13, 2023
S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video PredictionMohammad Adiban, Kalin Stefanov, Sabato Marco Siniscalchi et al. · gatech
We address the video prediction task by putting forth a novel model that combines (i) a novel hierarchical residual learning vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel autoregressive spatiotemporal predictive model (AST-PM). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the AST-PM's ability to handle spatiotemporal information, S-HR-VQVAE can better deal with major challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on four challenging tasks, namely KTH Human Action, TrafficBJ, Human3.6M, and Kitti, demonstrate that our model compares favorably against state-of-the-art video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and AST-PM parameters.
SPSep 11, 2020
STEP-GAN: A Step-by-Step Training for Multi Generator GANs with application to Cyber Security in Power SystemsMohammad Adiban, Arash Safari, Giampiero Salvi
In this study, we introduce a novel unsupervised countermeasure for smart grid power systems, based on generative adversarial networks (GANs). Given the pivotal role of smart grid systems (SGSs) in urban life, their security is of particular importance. In recent years, however, advances in the field of machine learning, have raised concerns about cyber attacks on these systems. Power systems, among the most important components of urban infrastructure, have, for example, been widely attacked by adversaries. Attackers disrupt power systems using false data injection attacks (FDIA), resulting in a breach of availability, integrity, or confidential principles of the system. Our model simulates possible attacks on power systems using multiple generators in a step-by-step interaction with a discriminator in the training phase. As a consequence, our system is robust to unseen attacks. Moreover, the proposed model considerably reduces the well-known mode collapse problem of GAN-based models. Our method is general and it can be potentially employed in a wide range of one of one-class classification tasks. The proposed model has low computational complexity and outperforms baseline systems about 14% and 41% in terms of accuracy on the highly imbalanced publicly available industrial control system (ICS) cyber attack power system dataset.
ASOct 29, 2019
Replay Spoofing Countermeasure Using Autoencoder and Siamese Network on ASVspoof 2019 ChallengeMohammad Adiban, Hossein Sameti, Saeedreza Shehnepoor
Automatic Speaker Verification (ASV) is the process of identifying a person based on the voice presented to a system. Different synthetic approaches allow spoofing to deceive ASV systems (ASVs), whether using techniques to imitate a voice or recunstruct the features. Attackers try to beat up the ASVs using four general techniques; impersonation, speech synthesis, voice conversion, and replay. The last technique is considered as a common and high potential tool for spoofing purposes since replay attacks are more accessible and require no technical knowledge from adversaries. In this study, we introduce a novel replay spoofing countermeasure for ASVs. Accordingly, we used the Constant Q Cepstral Coefficient (CQCC) features fed into an autoencoder to attain more informative features and to consider the noise information of spoofed utterances for discrimination purpose. Finally, different configurations of the Siamese network were used for the first time in this context for classification. The experiments performed on ASVspoof challenge 2019 dataset using Equal Error Rate (EER) and Tandem Detection Cost Function (t-DCF) as evaluation metrics show that the proposed system improved the results over the baseline by 10.73% and 0.2344 in terms of EER and t-DCF, respectively.
LGApr 26, 2019
Statistical feature embedding for heart sound classificationMohammad Adiban, Bagher BabaAli, Saeedreza Shehnepoor
Cardiovascular Disease (CVD) is considered as one of the principal causes of death in the world. Over recent years, this field of study has attracted researchers' attention to investigate heart sounds' patterns for disease diagnostics. In this study, an approach is proposed for normal/abnormal heart sound classification on the Physionet challenge 2016 dataset. For the first time, a fixed-length feature vector; called i-vector; is extracted from each heart sound using Mel Frequency Cepstral Coefficient (MFCC) features. Afterwards, Principal Component Analysis (PCA) transform and Variational Autoencoder (VAE) are applied on the i-vector to achieve dimension reduction. Eventually, the reduced size vector is fed to Gaussian Mixture Models (GMMs) and Support Vector Machine (SVM) for classification purpose. Experimental results demonstrate the proposed method could achieve a performance improvement of 16% based on Modified Accuracy (MAcc) compared with the baseline system on the Physoinet dataset.