IVNov 16, 2022
SWIN-SFTNet : Spatial Feature Expansion and Aggregation using Swin Transformer For Whole Breast micro-mass segmentationSharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli et al.
Incorporating various mass shapes and sizes in training deep learning architectures has made breast mass segmentation challenging. Moreover, manual segmentation of masses of irregular shapes is time-consuming and error-prone. Though Deep Neural Network has shown outstanding performance in breast mass segmentation, it fails in segmenting micro-masses. In this paper, we propose a novel U-net-shaped transformer-based architecture, called Swin-SFTNet, that outperforms state-of-the-art architectures in breast mammography-based micro-mass segmentation. Firstly to capture the global context, we designed a novel Spatial Feature Expansion and Aggregation Block(SFEA) that transforms sequential linear patches into a structured spatial feature. Next, we combine it with the local linear features extracted by the swin transformer block to improve overall accuracy. We also incorporate a novel embedding loss that calculates similarities between linear feature embeddings of the encoder and decoder blocks. With this approach, we achieve higher segmentation dice over the state-of-the-art by 3.10% on CBIS-DDSM, 3.81% on InBreast, and 3.13% on CBIS pre-trained model on the InBreast test data set.
IVAug 11, 2023
Revolutionizing Space Health (Swin-FSR): Advancing Super-Resolution of Fundus Images for SANS Visual Assessment TechnologyKhondker Fariha Hossain, Sharif Amit Kamran, Joshua Ong et al.
The rapid accessibility of portable and affordable retinal imaging devices has made early differential diagnosis easier. For example, color funduscopy imaging is readily available in remote villages, which can help to identify diseases like age-related macular degeneration (AMD), glaucoma, or pathological myopia (PM). On the other hand, astronauts at the International Space Station utilize this camera for identifying spaceflight-associated neuro-ocular syndrome (SANS). However, due to the unavailability of experts in these locations, the data has to be transferred to an urban healthcare facility (AMD and glaucoma) or a terrestrial station (e.g, SANS) for more precise disease identification. Moreover, due to low bandwidth limits, the imaging data has to be compressed for transfer between these two places. Different super-resolution algorithms have been proposed throughout the years to address this. Furthermore, with the advent of deep learning, the field has advanced so much that x2 and x4 compressed images can be decompressed to their original form without losing spatial information. In this paper, we introduce a novel model called Swin-FSR that utilizes Swin Transformer with spatial and depth-wise attention for fundus image super-resolution. Our architecture achieves Peak signal-to-noise-ratio (PSNR) of 47.89, 49.00 and 45.32 on three public datasets, namely iChallenge-AMD, iChallenge-PM, and G1020. Additionally, we tested the model's effectiveness on a privately held dataset for SANS provided by NASA and achieved comparable results against previous architectures.
IVJun 24, 2022
Feature Representation Learning for Robust Retinal Disease Detection from Optical Coherence Tomography ImagesSharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli et al.
Ophthalmic images may contain identical-looking pathologies that can cause failure in automated techniques to distinguish different retinal degenerative diseases. Additionally, reliance on large annotated datasets and lack of knowledge distillation can restrict ML-based clinical support systems' deployment in real-world environments. To improve the robustness and transferability of knowledge, an enhanced feature-learning module is required to extract meaningful spatial representations from the retinal subspace. Such a module, if used effectively, can detect unique disease traits and differentiate the severity of such retinal degenerative pathologies. In this work, we propose a robust disease detection architecture with three learning heads, i) A supervised encoder for retinal disease classification, ii) An unsupervised decoder for the reconstruction of disease-specific spatial information, and iii) A novel representation learning module for learning the similarity between encoder-decoder feature and enhancing the accuracy of the model. Our experimental results on two publicly available OCT datasets illustrate that the proposed model outperforms existing state-of-the-art models in terms of accuracy, interpretability, and robustness for out-of-distribution retinal disease detection.
IVMar 16, 2023
SwinVFTR: A Novel Volumetric Feature-learning Transformer for 3D OCT Fluid SegmentationKhondker Fariha Hossain, Sharif Amit Kamran, Alireza Tavakkoli et al.
Accurately segmenting fluid in 3D optical coherence tomography (OCT) images is critical for detecting eye diseases but remains challenging. Traditional autoencoder-based methods struggle with resolution loss and information recovery. While transformer-based models improve segmentation, they arent optimized for 3D OCT volumes, which vary by vendor and extraction technique. To address this, we propose SwinVFTR, a transformer architecture for precise fluid segmentation in 3D OCT images. SwinVFTR employs channel-wise volumetric sampling and a shifted window transformer block to improve fluid localization. Moreover, a novel volumetric attention block enhances spatial and depth-wise attention. Trained using multi-class dice loss, SwinVFTR outperforms existing models on Spectralis, Cirrus, and Topcon OCT datasets, achieving mean dice scores of 0.72, 0.59, and 0.68, respectively, along with superior performance in mean intersection-over-union (IOU) and structural similarity (SSIM) metrics.
IVOct 13, 2022
Virtual-Reality based Vestibular Ocular Motor Screening for Concussion Detection using Machine-LearningKhondker Fariha Hossain, Sharif Amit Kamran, Prithul Sarker et al.
Sport-related concussion (SRC) depends on sensory information from visual, vestibular, and somatosensory systems. At the same time, the current clinical administration of Vestibular/Ocular Motor Screening (VOMS) is subjective and deviates among administrators. Therefore, for the assessment and management of concussion detection, standardization is required to lower the risk of injury and increase the validation among clinicians. With the advancement of technology, virtual reality (VR) can be utilized to advance the standardization of the VOMS, increasing the accuracy of testing administration and decreasing overall false positive rates. In this paper, we experimented with multiple machine learning methods to detect SRC on VR-generated data using VOMS. In our observation, the data generated from VR for smooth pursuit (SP) and the Visual Motion Sensitivity (VMS) tests are highly reliable for concussion detection. Furthermore, we train and evaluate these models, both qualitatively and quantitatively. Our findings show these models can reach high true-positive-rates of around 99.9 percent of symptom provocation on the VR stimuli-based VOMS vs. current clinical manual VOMS.
CVJun 27, 2025
GRASP-PsONet: Gradient-based Removal of Spurious Patterns for PsOriasis Severity ClassificationBasudha Pal, Sharif Amit Kamran, Brendon Lutnick et al.
Psoriasis (PsO) severity scoring is important for clinical trials but is hindered by inter-rater variability and the burden of in person clinical evaluation. Remote imaging using patient captured mobile photos offers scalability but introduces challenges, such as variation in lighting, background, and device quality that are often imperceptible to humans but can impact model performance. These factors, along with inconsistencies in dermatologist annotations, reduce the reliability of automated severity scoring. We propose a framework to automatically flag problematic training images that introduce spurious correlations which degrade model generalization, using a gradient based interpretability approach. By tracing the gradients of misclassified validation images, we detect training samples where model errors align with inconsistently rated examples or are affected by subtle, nonclinical artifacts. We apply this method to a ConvNeXT based weakly supervised model designed to classify PsO severity from phone images. Removing 8.2% of flagged images improves model AUC-ROC by 5% (85% to 90%) on a held out test set. Commonly, multiple annotators and an adjudication process ensure annotation accuracy, which is expensive and time consuming. Our method detects training images with annotation inconsistencies, potentially removing the need for manual review. When applied to a subset of training data rated by two dermatologists, the method identifies over 90% of cases with inter-rater disagreement by reviewing only the top 30% of samples. This improves automated scoring for remote assessments, ensuring robustness despite data collection variability.
SPOct 17, 2021
ECG-ATK-GAN: Robustness against Adversarial Attacks on ECGs using Conditional Generative Adversarial NetworksKhondker Fariha Hossain, Sharif Amit Kamran, Alireza Tavakkoli et al.
Automating arrhythmia detection from ECG requires a robust and trusted system that retains high accuracy under electrical disturbances. Many machine learning approaches have reached human-level performance in classifying arrhythmia from ECGs. However, these architectures are vulnerable to adversarial attacks, which can misclassify ECG signals by decreasing the model's accuracy. Adversarial attacks are small crafted perturbations injected in the original data which manifest the out-of-distribution shifts in signal to misclassify the correct class. Thus, security concerns arise for false hospitalization and insurance fraud abusing these perturbations. To mitigate this problem, we introduce the first novel Conditional Generative Adversarial Network (GAN), robust against adversarial attacked ECG signals and retaining high accuracy. Our architecture integrates a new class-weighted objective function for adversarial perturbation identification and new blocks for discerning and combining out-of-distribution shifts in signals in the learning process for accurately classifying various arrhythmia types. Furthermore, we benchmark our architecture on six different white and black-box attacks and compare them with other recently proposed arrhythmia classification models on two publicly available ECG arrhythmia datasets. The experiment confirms that our model is more robust against such adversarial attacks for classifying arrhythmia with high accuracy.
CRAug 4, 2021
Semi-supervised Conditional GAN for Simultaneous Generation and Detection of Phishing URLs: A Game theoretic PerspectiveSharif Amit Kamran, Shamik Sengupta, Alireza Tavakkoli
Spear Phishing is a type of cyber-attack where the attacker sends hyperlinks through email on well-researched targets. The objective is to obtain sensitive information by imitating oneself as a trustworthy website. In recent times, deep learning has become the standard for defending against such attacks. However, these architectures were designed with only defense in mind. Moreover, the attacker's perspective and motivation are absent while creating such models. To address this, we need a game-theoretic approach to understand the perspective of the attacker (Hacker) and the defender (Phishing URL detector). We propose a Conditional Generative Adversarial Network with novel training strategy for real-time phishing URL detection. Additionally, we train our architecture in a semi-supervised manner to distinguish between adversarial and real examples, along with detecting malicious and benign URLs. We also design two games between the attacker and defender in training and deployment settings by utilizing the game-theoretic perspective. Our experiments confirm that the proposed architecture surpasses recent state-of-the-art architectures for phishing URLs detection.
LGJul 16, 2021
ECG-Adv-GAN: Detecting ECG Adversarial Examples with Conditional Generative Adversarial NetworksKhondker Fariha Hossain, Sharif Amit Kamran, Alireza Tavakkoli et al.
Electrocardiogram (ECG) acquisition requires an automated system and analysis pipeline for understanding specific rhythm irregularities. Deep neural networks have become a popular technique for tracing ECG signals, outperforming human experts. Despite this, convolutional neural networks are susceptible to adversarial examples that can misclassify ECG signals and decrease the model's precision. Moreover, they do not generalize well on the out-of-distribution dataset. The GAN architecture has been employed in recent works to synthesize adversarial ECG signals to increase existing training data. However, they use a disjointed CNN-based classification architecture to detect arrhythmia. Till now, no versatile architecture has been proposed that can detect adversarial examples and classify arrhythmia simultaneously. To alleviate this, we propose a novel Conditional Generative Adversarial Network to simultaneously generate ECG signals for different categories and detect cardiac abnormalities. Moreover, the model is conditioned on class-specific ECG signals to synthesize realistic adversarial examples. Consequently, we compare our architecture and show how it outperforms other classification models in normal/abnormal ECG signal detection by benchmarking real world and adversarial signals.
IVApr 14, 2021
VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision TransformersSharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli et al.
In Fluorescein Angiography (FA), an exogenous dye is injected in the bloodstream to image the vascular structure of the retina. The injected dye can cause adverse reactions such as nausea, vomiting, anaphylactic shock, and even death. In contrast, color fundus imaging is a non-invasive technique used for photographing the retina but does not have sufficient fidelity for capturing its vascular structure. The only non-invasive method for capturing retinal vasculature is optical coherence tomography-angiography (OCTA). However, OCTA equipment is quite expensive, and stable imaging is limited to small areas on the retina. In this paper, we propose a novel conditional generative adversarial network (GAN) capable of simultaneously synthesizing FA images from fundus photographs while predicting retinal degeneration. The proposed system has the benefit of addressing the problem of imaging retinal vasculature in a non-invasive manner as well as predicting the existence of retinal abnormalities. We use a semi-supervised approach to train our GAN using multiple weighted losses on different modalities of data. Our experiments validate that the proposed architecture exceeds recent state-of-the-art generative networks for fundus-to-angiography synthesis. Moreover, our vision transformer-based discriminators generalize quite well on out-of-distribution data sets for retinal disease prediction.
IVJan 3, 2021
RV-GAN: Segmenting Retinal Vascular Structure in Fundus Photographs using a Novel Multi-scale Generative Adversarial NetworkSharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli et al.
High fidelity segmentation of both macro and microvascular structure of the retina plays a pivotal role in determining degenerative retinal diseases, yet it is a difficult problem. Due to successive resolution loss in the encoding phase combined with the inability to recover this lost information in the decoding phase, autoencoding based segmentation approaches are limited in their ability to extract retinal microvascular structure. We propose RV-GAN, a new multi-scale generative architecture for accurate retinal vessel segmentation to alleviate this. The proposed architecture uses two generators and two multi-scale autoencoding discriminators for better microvessel localization and segmentation. In order to avoid the loss of fidelity suffered by traditional GAN-based segmentation systems, we introduce a novel weighted feature matching loss. This new loss incorporates and prioritizes features from the discriminator's decoder over the encoder. Doing so combined with the fact that the discriminator's decoder attempts to determine real or fake images at the pixel level better preserves macro and microvascular structure. By combining reconstruction and weighted feature matching loss, the proposed architecture achieves an area under the curve (AUC) of 0.9887, 0.9914, and 0.9887 in pixel-wise segmentation of retinal vasculature from three publicly available datasets, namely DRIVE, CHASE-DB1, and STARE, respectively. Additionally, RV-GAN outperforms other architectures in two additional relevant metrics, mean intersection-over-union (Mean-IOU) and structural similarity measure (SSIM).
IVJul 17, 2020
Attention2AngioGAN: Synthesizing Fluorescein Angiography from Retinal Fundus Images using Generative Adversarial NetworksSharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli et al.
Fluorescein Angiography (FA) is a technique that employs the designated camera for Fundus photography incorporating excitation and barrier filters. FA also requires fluorescein dye that is injected intravenously, which might cause adverse effects ranging from nausea, vomiting to even fatal anaphylaxis. Currently, no other fast and non-invasive technique exists that can generate FA without coupling with Fundus photography. To eradicate the need for an invasive FA extraction procedure, we introduce an Attention-based Generative network that can synthesize Fluorescein Angiography from Fundus images. The proposed gan incorporates multiple attention based skip connections in generators and comprises novel residual blocks for both generators and discriminators. It utilizes reconstruction, feature-matching, and perceptual loss along with adversarial training to produces realistic Angiograms that is hard for experts to distinguish from real ones. Our experiments confirm that the proposed architecture surpasses recent state-of-the-art generative networks for fundus-to-angio translation task.
IVMay 16, 2020
Improving Robustness using Joint Attention Network For Detecting Retinal Degeneration From Optical Coherence Tomography ImagesSharif Amit Kamran, Alireza Tavakkoli, Stewart Lee Zuckerbrod
Noisy data and the similarity in the ocular appearances caused by different ophthalmic pathologies pose significant challenges for an automated expert system to accurately detect retinal diseases. In addition, the lack of knowledge transferability and the need for unreasonably large datasets limit clinical application of current machine learning systems. To increase robustness, a better understanding of how the retinal subspace deformations lead to various levels of disease severity needs to be utilized for prioritizing disease-specific model details. In this paper we propose the use of disease-specific feature representation as a novel architecture comprised of two joint networks -- one for supervised encoding of disease model and the other for producing attention maps in an unsupervised manner to retain disease specific spatial information. Our experimental results on publicly available datasets show the proposed joint-network significantly improves the accuracy and robustness of state-of-the-art retinal disease classification networks on unseen datasets.
IVMay 11, 2020
Fundus2Angio: A Conditional GAN Architecture for Generating Fluorescein Angiography Images from Retinal Fundus PhotographySharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli et al.
Carrying out clinical diagnosis of retinal vascular degeneration using Fluorescein Angiography (FA) is a time consuming process and can pose significant adverse effects on the patient. Angiography requires insertion of a dye that may cause severe adverse effects and can even be fatal. Currently, there are no non-invasive systems capable of generating Fluorescein Angiography images. However, retinal fundus photography is a non-invasive imaging technique that can be completed in a few seconds. In order to eliminate the need for FA, we propose a conditional generative adversarial network (GAN) to translate fundus images to FA images. The proposed GAN consists of a novel residual block capable of generating high quality FA images. These images are important tools in the differential diagnosis of retinal diseases without the need for invasive procedure with possible side effects. Our experiments show that the proposed architecture outperforms other state-of-the-art generative networks. Furthermore, our proposed model achieves better qualitative results indistinguishable from real angiograms.
IVOct 13, 2019
Optic-Net: A Novel Convolutional Neural Network for Diagnosis of Retinal Diseases from Optical Tomography ImagesSharif Amit Kamran, Sourajit Saha, Ali Shihab Sabbir et al.
Diagnosing different retinal diseases from Spectral Domain Optical Coherence Tomography (SD-OCT) images is a challenging task. Different automated approaches such as image processing, machine learning and deep learning algorithms have been used for early detection and diagnosis of retinal diseases. Unfortunately, these are prone to error and computational inefficiency, which requires further intervention from human experts. In this paper, we propose a novel convolution neural network architecture to successfully distinguish between different degeneration of retinal layers and their underlying causes. The proposed novel architecture outperforms other classification models while addressing the issue of gradient explosion. Our approach reaches near perfect accuracy of 99.8% and 100% for two separately available Retinal SD-OCT data-set respectively. Additionally, our architecture predicts retinal diseases in real time while outperforming human diagnosticians.
CVOct 10, 2018
AI Learns to Recognize Bengali Handwritten Digits: Bengali.AI Computer Vision Challenge 2018Sharif Amit Kamran, Ahmed Imtiaz Humayun, Samiul Alam et al.
Solving problems with Artificial intelligence in a competitive manner has long been absent in Bangladesh and Bengali-speaking community. On the other hand, there has not been a well structured database for Bengali Handwritten digits for mass public use. To bring out the best minds working in machine learning and use their expertise to create a model which can easily recognize Bengali Handwritten digits, we organized Bengali.AI Computer Vision Challenge.The challenge saw both local and international teams participating with unprecedented efforts.
CVAug 30, 2018
Total Recall: Understanding Traffic Signs using Deep Hierarchical Convolutional Neural NetworksSourajit Saha, Sharif Amit Kamran, Ali Shihab Sabbir
Recognizing Traffic Signs using intelligent systems can drastically reduce the number of accidents happening world-wide. With the arrival of Self-driving cars it has become a staple challenge to solve the automatic recognition of Traffic and Hand-held signs in the major streets. Various machine learning techniques like Random Forest, SVM as well as deep learning models has been proposed for classifying traffic signs. Though they reach state-of-the-art performance on a particular data-set, but fall short of tackling multiple Traffic Sign Recognition benchmarks. In this paper, we propose a novel and one-for-all architecture that aces multiple benchmarks with better overall score than the state-of-the-art architectures. Our model is made of residual convolutional blocks with hierarchical dilated skip connections joined in steps. With this we score 99.33% Accuracy in German sign recognition benchmark and 99.17% Accuracy in Belgian traffic sign classification benchmark. Moreover, we propose a newly devised dilated residual learning representation technique which is very low in both memory and computational complexity.
CVJul 26, 2017
Efficient Yet Deep Convolutional Neural Networks for Semantic SegmentationSharif Amit Kamran, Ali Shihab Sabbir
Semantic Segmentation using deep convolutional neural network pose more complex challenge for any GPU intensive task. As it has to compute million of parameters, it results to huge memory consumption. Moreover, extracting finer features and conducting supervised training tends to increase the complexity. With the introduction of Fully Convolutional Neural Network, which uses finer strides and utilizes deconvolutional layers for upsampling, it has been a go to for any image segmentation task. In this paper, we propose two segmentation architecture which not only needs one-third the parameters to compute but also gives better accuracy than the similar architectures. The model weights were transferred from the popular neural net like VGG19 and VGG16 which were trained on Imagenet classification data-set. Then we transform all the fully connected layers to convolutional layers and use dilated convolution for decreasing the parameters. Lastly, we add finer strides and attach four skip architectures which are element-wise summed with the deconvolutional layers in steps. We train and test on different sparse and fine data-sets like Pascal VOC2012, Pascal-Context and NYUDv2 and show how better our model performs in this tasks. On the other hand our model has a faster inference time and consumes less memory for training and testing on NVIDIA Pascal GPUs, making it more efficient and less memory consuming architecture for pixel-wise segmentation.