Sherif Abdulatif

CV
h-index14
20papers
705citations
Novelty40%
AI Score30

20 Papers

SDMar 28, 2022
CMGAN: Conformer-based Metric GAN for Speech Enhancement

Ruizhe Cao, Sherif Abdulatif, Bin Yang

Recently, convolution-augmented transformer (Conformer) has achieved promising performance in automatic speech recognition (ASR) and time-domain speech enhancement (SE), as it can capture both local and global dependencies in the speech signal. In this paper, we propose a conformer-based metric generative adversarial network (CMGAN) for SE in the time-frequency (TF) domain. In the generator, we utilize two-stage conformer blocks to aggregate all magnitude and complex spectrogram information by modeling both time and frequency dependencies. The estimation of magnitude and complex spectrogram is decoupled in the decoder stage and then jointly incorporated to reconstruct the enhanced speech. In addition, a metric discriminator is employed to further improve the quality of the enhanced estimated speech by optimizing the generator with respect to a corresponding evaluation score. Quantitative analysis on Voice Bank+DEMAND dataset indicates the capability of CMGAN in outperforming various previous models with a margin, i.e., PESQ of 3.41 and SSNR of 11.10 dB.

SDSep 22, 2022
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

Sherif Abdulatif, Ruizhe Cao, Bin Yang

In this work, we further develop the conformer-based metric generative adversarial network (CMGAN) model for speech enhancement (SE) in the time-frequency (TF) domain. This paper builds on our previous work but takes a more in-depth look by conducting extensive ablation studies on model inputs and architectural design choices. We rigorously tested the generalization ability of the model to unseen noise types and distortions. We have fortified our claims through DNS-MOS measurements and listening tests. Rather than focusing exclusively on the speech denoising task, we extend this work to address the dereverberation and super-resolution tasks. This necessitated exploring various architectural changes, specifically metric discriminator scores and masking techniques. It is essential to highlight that this is among the earliest works that attempted complex TF-domain super-resolution. Our findings show that CMGAN outperforms existing state-of-the-art methods in the three major speech enhancement tasks: denoising, dereverberation, and super-resolution. For example, in the denoising task using the Voice Bank+DEMAND dataset, CMGAN notably exceeded the performance of prior models, attaining a PESQ score of 3.41 and an SSNR of 11.10 dB. Audio samples and CMGAN implementations are available online.

CVAug 13, 2024
Exploring Domain Shift on Radar-Based 3D Object Detection Amidst Diverse Environmental Conditions

Miao Zhang, Sherif Abdulatif, Benedikt Loesch et al.

The rapid evolution of deep learning and its integration with autonomous driving systems have led to substantial advancements in 3D perception using multimodal sensors. Notably, radar sensors show greater robustness compared to cameras and lidar under adverse weather and varying illumination conditions. This study delves into the often-overlooked yet crucial issue of domain shift in 4D radar-based object detection, examining how varying environmental conditions, such as different weather patterns and road types, impact 3D object detection performance. Our findings highlight distinct domain shifts across various weather scenarios, revealing unique dataset sensitivities that underscore the critical role of radar point cloud generation. Additionally, we demonstrate that transitioning between different road types, especially from highways to urban settings, introduces notable domain shifts, emphasizing the necessity for diverse data collection across varied road environments. To the best of our knowledge, this is the first comprehensive analysis of domain shift effects on 4D radar-based object detection. We believe this empirical study contributes to understanding the complex nature of domain shifts in radar data and suggests paths forward for data collection strategy in the face of environmental variability.

CVMar 4, 2025
Class-Aware PillarMix: Can Mixed Sample Data Augmentation Enhance 3D Object Detection with Radar Point Clouds?

Miao Zhang, Sherif Abdulatif, Benedikt Loesch et al.

Due to the significant effort required for data collection and annotation in 3D perception tasks, mixed sample data augmentation (MSDA) has been widely studied to generate diverse training samples by mixing existing data. Recently, many MSDA techniques have been developed for point clouds, but they mainly target LiDAR data, leaving their application to radar point clouds largely unexplored. In this paper, we examine the feasibility of applying existing MSDA methods to radar point clouds and identify several challenges in adapting these techniques. These obstacles stem from the radar's irregular angular distribution, deviations from a single-sensor polar layout in multi-radar setups, and point sparsity. To address these issues, we propose Class-Aware PillarMix (CAPMix), a novel MSDA approach that applies MixUp at the pillar level in 3D point clouds, guided by class labels. Unlike methods that rely a single mix ratio to the entire sample, CAPMix assigns an independent ratio to each pillar, boosting sample diversity. To account for the density of different classes, we use class-specific distributions: for dense objects (e.g., large vehicles), we skew ratios to favor points from another sample, while for sparse objects (e.g., pedestrians), we sample more points from the original. This class-aware mixing retains critical details and enriches each sample with new information, ultimately generating more diverse training data. Experimental results demonstrate that our method not only significantly boosts performance but also outperforms existing MSDA approaches across two datasets (Bosch Street and K-Radar). We believe that this straightforward yet effective approach will spark further investigation into MSDA techniques for radar data.

SPOct 16, 2021
A MIMO Radar-based Few-Shot Learning Approach for Human-ID

Pascal Weller, Fady Aziz, Sherif Abdulatif et al.

Radar for deep learning-based human identification has become a research area of increasing interest. It has been shown that micro-Doppler ($μ$-D) can reflect the walking behavior through capturing the periodic limbs' micro-motions. One of the main aspects is maximizing the number of included classes while considering the real-time and training dataset size constraints. In this paper, a multiple-input-multiple-output (MIMO) radar is used to formulate micro-motion spectrograms of the elevation angular velocity ($μ$-$ω$). The effectiveness of concatenating this newly-formulated spectrogram with the commonly used $μ$-D is investigated. To accommodate for non-constrained real walking motion, an adaptive cycle segmentation framework is utilized and a metric learning network is trained on half gait cycles ($\approx$ 0.5 s). Studies on the effects of various numbers of classes (5--20), different dataset sizes, and varying observation time windows 1--2 s are conducted. A non-constrained walking dataset of 22 subjects is collected with different aspect angles with respect to the radar. The proposed few-shot learning (FSL) approach achieves a classification error of 11.3 % with only 2 min of training data per subject.

CVMar 15, 2021
Uncertainty-Based Biological Age Estimation of Brain MRI Scans

Karim Armanious, Sherif Abdulatif, Wenbin Shi et al.

Age is an essential factor in modern diagnostic procedures. However, assessment of the true biological age (BA) remains a daunting task due to the lack of reference ground-truth labels. Current BA estimation approaches are either restricted to skeletal images or rely on non-imaging modalities that yield a whole-body BA assessment. However, various organ systems may exhibit different aging characteristics due to lifestyle and genetic factors. In this initial study, we propose a new framework for organ-specific BA estimation utilizing 3D magnetic resonance image (MRI) scans. As a first step, this framework predicts the chronological age (CA) together with the corresponding patient-dependent aleatoric uncertainty. An iterative training algorithm is then utilized to segregate atypical aging patients from the given population based on the predicted uncertainty scores. In this manner, we hypothesize that training a new model on the remaining population should approximate the true BA behavior. We apply the proposed methodology on a brain MRI dataset containing healthy individuals as well as Alzheimer's patients. We demonstrate the correlation between the predicted BAs and the expected cognitive deterioration in Alzheimer's patients.

SDOct 20, 2020
Investigating Cross-Domain Losses for Speech Enhancement

Sherif Abdulatif, Karim Armanious, Jayasankar T. Sajeev et al.

Recent years have seen a surge in the number of available frameworks for speech enhancement (SE) and recognition. Whether model-based or constructed via deep learning, these frameworks often rely in isolation on either time-domain signals or time-frequency (TF) representations of speech data. In this study, we investigate the advantages of each set of approaches by separately examining their impact on speech intelligibility and quality. Furthermore, we combine the fragmented benefits of time-domain and TF speech representations by introducing two new cross-domain SE frameworks. A quantitative comparative analysis against recent model-based and deep learning SE approaches is performed to illustrate the merit of the proposed frameworks.

IVSep 22, 2020
Age-Net: An MRI-Based Iterative Framework for Brain Biological Age Estimation

Karim Armanious, Sherif Abdulatif, Wenbin Shi et al.

The concept of biological age (BA), although important in clinical practice, is hard to grasp mainly due to the lack of a clearly defined reference standard. For specific applications, especially in pediatrics, medical image data are used for BA estimation in a routine clinical context. Beyond this young age group, BA estimation is mostly restricted to whole-body assessment using non-imaging indicators such as blood biomarkers, genetic and cellular data. However, various organ systems may exhibit different aging characteristics due to lifestyle and genetic factors. Thus, a whole-body assessment of the BA does not reflect the deviations of aging behavior between organs. To this end, we propose a new imaging-based framework for organ-specific BA estimation. In this initial study, we focus mainly on brain MRI. As a first step, we introduce a chronological age (CA) estimation framework using deep convolutional neural networks (Age-Net). We quantitatively assess the performance of this framework in comparison to existing state-of-the-art CA estimation approaches. Furthermore, we expand upon Age-Net with a novel iterative data-cleaning algorithm to segregate atypical-aging patients (BA $\not \approx$ CA) from the given population. We hypothesize that the remaining population should approximate the true BA behavior. We apply the proposed methodology on a brain magnetic resonance image (MRI) dataset containing healthy individuals as well as Alzheimer's patients with different dementia ratings. We demonstrate the correlation between the predicted BAs and the expected cognitive deterioration in Alzheimer's patients. A statistical and visualization-based analysis has provided evidence regarding the potential and current challenges of the proposed methodology.

ASMar 3, 2020
SELD-TCN: Sound Event Localization & Detection via Temporal Convolutional Networks

Karim Guirguis, Christoph Schorn, Andre Guntoro et al.

The understanding of the surrounding environment plays a critical role in autonomous robotic systems, such as self-driving cars. Extensive research has been carried out concerning visual perception. Yet, to obtain a more complete perception of the environment, autonomous systems of the future should also take acoustic information into account. Recent sound event localization and detection (SELD) frameworks utilize convolutional recurrent neural networks (CRNNs). However, considering the recurrent nature of CRNNs, it becomes challenging to implement them efficiently on embedded hardware. Not only are their computations strenuous to parallelize, but they also require high memory bandwidth and large memory buffers. In this work, we develop a more robust and hardware-friendly novel architecture based on a temporal convolutional network(TCN). The proposed framework (SELD-TCN) outperforms the state-of-the-art SELDnet performance on four different datasets. Moreover, SELD-TCN achieves 4x faster training time per epoch and 40x faster inference time on an ordinary graphics processing unit (GPU).

ASOct 21, 2019
AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks

Sherif Abdulatif, Karim Armanious, Karim Guirguis et al.

Automatic speech recognition (ASR) systems are of vital importance nowadays in commonplace tasks such as speech-to-text processing and language translation. This created the need for an ASR system that can operate in realistic crowded environments. Thus, speech enhancement is a valuable building block in ASR systems and other applications such as hearing aids, smartphones and teleconferencing systems. In this paper, a generative adversarial network (GAN) based framework is investigated for the task of speech enhancement, more specifically speech denoising of audio tracks. A new architecture based on CasNet generator and an additional feature-based loss are incorporated to get realistically denoised speech phonetics. Finally, the proposed framework is shown to outperform other learning and traditional model-based speech enhancement approaches.

IVOct 21, 2019
ipA-MedGAN: Inpainting of Arbitrary Regions in Medical Imaging

Karim Armanious, Vijeth Kumar, Sherif Abdulatif et al.

Local deformations in medical modalities are common phenomena due to a multitude of factors such as metallic implants or limited field of views in magnetic resonance imaging (MRI). Completion of the missing or distorted regions is of special interest for automatic image analysis frameworks to enhance post-processing tasks such as segmentation or classification. In this work, we propose a new generative framework for medical image inpainting, titled ipA-MedGAN. It bypasses the limitations of previous frameworks by enabling inpainting of arbitrary shaped regions without a prior localization of the regions of interest. Thorough qualitative and quantitative comparisons with other inpainting and translational approaches have illustrated the superior performance of the proposed framework for the task of brain MR inpainting.

IVOct 14, 2019
Organ-based Chronological Age Estimation based on 3D MRI Scans

Karim Armanious, Sherif Abdulatif, Anish Rao Bhaktharaguttu et al.

Individuals age differently depending on a multitude of different factors such as lifestyle, medical history and genetics. Often, the global chronological age is not indicative of the true ageing process. An organ-based age estimation would yield a more accurate health state assessment. In this work, we propose a new deep learning architecture for organ-based age estimation based on magnetic resonance images (MRI). The proposed network is a 3D convolutional neural network (CNN) with increased depth and width made possible by the hybrid utilization of inception and fire modules. We apply the proposed framework for the tasks of brain and knee age estimation. Quantitative comparisons against concurrent MR-based regression networks and different 2D and 3D data feeding strategies illustrated the superior performance of the proposed work.

IVOct 12, 2019
Unsupervised Adversarial Correction of Rigid MR Motion Artifacts

Karim Armanious, Aastha Tanwar, Sherif Abdulatif et al.

Motion is one of the main sources for artifacts in magnetic resonance (MR) images. It can have significant consequences on the diagnostic quality of the resultant scans. Previously, supervised adversarial approaches have been suggested for the correction of MR motion artifacts. However, these approaches suffer from the limitation of required paired co-registered datasets for training which are often hard or impossible to acquire. Building upon our previous work, we introduce a new adversarial framework with a new generator architecture and loss function for the unsupervised correction of severe rigid motion artifacts in the brain region. Quantitative and qualitative comparisons with other supervised and unsupervised translation approaches showcase the enhanced performance of the introduced framework.

CVMar 8, 2019
Unsupervised Medical Image Translation Using Cycle-MedGAN

Karim Armanious, Chenming Jiang, Sherif Abdulatif et al.

Image-to-image translation is a new field in computer vision with multiple potential applications in the medical domain. However, for supervised image translation frameworks, co-registered datasets, paired in a pixel-wise sense, are required. This is often difficult to acquire in realistic medical scenarios. On the other hand, unsupervised translation frameworks often result in blurred translated images with unrealistic details. In this work, we propose a new unsupervised translation framework which is titled Cycle-MedGAN. The proposed framework utilizes new non-adversarial cycle losses which direct the framework to minimize the textural and perceptual discrepancies in the translated images. Qualitative and quantitative comparisons against other unsupervised translation approaches demonstrate the performance of the proposed framework for PET-CT translation and MR motion correction.

CVMar 4, 2019
An Adversarial Super-Resolution Remedy for Radar Design Trade-offs

Karim Armanious, Sherif Abdulatif, Fady Aziz et al.

Radar is of vital importance in many fields, such as autonomous driving, safety and surveillance applications. However, it suffers from stringent constraints on its design parametrization leading to multiple trade-offs. For example, the bandwidth in FMCW radars is inversely proportional with both the maximum unambiguous range and range resolution. In this work, we introduce a new method for circumventing radar design trade-offs. We propose the use of recent advances in computer vision, more specifically generative adversarial networks (GANs), to enhance low-resolution radar acquisitions into higher resolution counterparts while maintaining the advantages of the low-resolution parametrization. The capability of the proposed method was evaluated on the velocity resolution and range-azimuth trade-offs in micro-Doppler signatures and FMCW uniform linear array (ULA) radars, respectively.

CVNov 17, 2018
Person Identification and Body Mass Index: A Deep Learning-Based Study on Micro-Dopplers

Sherif Abdulatif, Fady Aziz, Karim Armanious et al.

Obtaining a smart surveillance requires a sensing system that can capture accurate and detailed information for the human walking style. The radar micro-Doppler ($\boldsymbolμ$-D) analysis is proved to be a reliable metric for studying human locomotions. Thus, $\boldsymbolμ$-D signatures can be used to identify humans based on their walking styles. Additionally, the signatures contain information about the radar cross section (RCS) of the moving subject. This paper investigates the effect of human body characteristics on human identification based on their $\boldsymbolμ$-D signatures. In our proposed experimental setup, a treadmill is used to collect $\boldsymbolμ$-D signatures of 22 subjects with different genders and body characteristics. Convolutional autoencoders (CAE) are then used to extract the latent space representation from the $\boldsymbolμ$-D signatures. It is then interpreted in two dimensions using t-distributed stochastic neighbor embedding (t-SNE). Our study shows that the body mass index (BMI) has a correlation with the $\boldsymbolμ$-D signature of the walking subject. A 50-layer deep residual network is then trained to identify the walking subject based on the $\boldsymbolμ$-D signature. We achieve an accuracy of 98% on the test set with high signal-to-noise-ratio (SNR) and 84% in case of different SNR levels.

CVNov 12, 2018
Towards Adversarial Denoising of Radar Micro-Doppler Signatures

Sherif Abdulatif, Karim Armanious, Fady Aziz et al.

Generative Adversarial Networks (GANs) are considered the state-of-the-art in the field of image generation. They learn the joint distribution of the training data and attempt to generate new data samples in high dimensional space following the same distribution as the input. Recent improvements in GANs opened the field to many other computer vision applications based on improving and changing the characteristics of the input image to follow some given training requirements. In this paper, we propose a novel technique for the denoising and reconstruction of the micro-Doppler ($\boldsymbolμ$-D) spectra of walking humans based on GANs. Two sets of experiments were collected on 22 subjects walking on a treadmill at an intermediate velocity using a \unit[25]{GHz} CW radar. In one set, a clean $\boldsymbolμ$-D spectrum is collected for each subject by placing the radar at a close distance to the subject. In the other set, variations are introduced in the experiment setup to introduce different noise and clutter effects on the spectrum by changing the distance and placing reflective objects between the radar and the target. Synthetic paired noisy and noise-free spectra were used for training, while validation was carried out on the real noisy measured data. Finally, qualitative and quantitative comparison with other classical radar denoising approaches in the literature demonstrated the proposed GANs framework is better and more robust to different noise levels.

HCNov 25, 2017
Stairs Detection for Enhancing Wheelchair Capabilities Based on Radar Sensors

Sherif Abdulatif, Bernhard Kleiner, Fady Aziz et al.

Powered wheelchair users encounter barriers to their mobility everyday. Entering a building with non barrier-free areas can massively impact the user mobility related activities. There are a few commercial devices and some experimental that can climb stairs using for instance adaptive wheels with joints or caterpillar drive. These systems rely on the use for sensing and control. For safe automated obstacle crossing, a robust and environment invariant detection of the surrounding is necessary. Radar may prove to be a suitable sensor for its capability to handle harsh outdoor environmental conditions. In this paper, we introduce a mirror based two dimensional Frequency-Modulated Continuous-Wave (FMCW) radar scanner for stair detection. A radar image based stair dimensioning approach is presented and tested under laboratory and realistic conditions.

CVNov 25, 2017
Micro-Doppler Based Human-Robot Classification Using Ensemble and Deep Learning Approaches

Sherif Abdulatif, Qian Wei, Fady Aziz et al.

Radar sensors can be used for analyzing the induced frequency shifts due to micro-motions in both range and velocity dimensions identified as micro-Doppler ($\boldsymbolμ$-D) and micro-Range ($\boldsymbolμ$-R), respectively. Different moving targets will have unique $\boldsymbolμ$-D and $\boldsymbolμ$-R signatures that can be used for target classification. Such classification can be used in numerous fields, such as gait recognition, safety and surveillance. In this paper, a 25 GHz FMCW Single-Input Single-Output (SISO) radar is used in industrial safety for real-time human-robot identification. Due to the real-time constraint, joint Range-Doppler (R-D) maps are directly analyzed for our classification problem. Furthermore, a comparison between the conventional classical learning approaches with handcrafted extracted features, ensemble classifiers and deep learning approaches is presented. For ensemble classifiers, restructured range and velocity profiles are passed directly to ensemble trees, such as gradient boosting and random forest without feature extraction. Finally, a Deep Convolutional Neural Network (DCNN) is used and raw R-D images are directly fed into the constructed network. DCNN shows a superior performance of 99\% accuracy in identifying humans from robots on a single R-D map.

CVNov 25, 2017
Real-Time Capable Micro-Doppler Signature Decomposition of Walking Human Limbs

Sherif Abdulatif, Fady Aziz, Bernhard Kleiner et al.

Unique micro-Doppler signature ($\boldsymbolμ$-D) of a human body motion can be analyzed as the superposition of different body parts $\boldsymbolμ$-D signatures. Extraction of human limbs $\boldsymbolμ$-D signatures in real-time can be used to detect, classify and track human motion especially for safety application. In this paper, two methods are combined to simulate $\boldsymbolμ$-D signatures of a walking human. Furthermore, a novel limbs $μ$-D signature time independent decomposition feasibility study is presented based on features as $μ$-D signatures and range profiles also known as micro-Range ($μ$-R). Walking human body parts can be divided into four classes (base, arms, legs, feet) and a decision tree classifier is used. Validation is done and the classifier is able to decompose $μ$-D signatures of limbs from a walking human signature on real-time basis.