Alireza Tavakkoli

IV
h-index51
28papers
537citations
Novelty50%
AI Score52

28 Papers

IVOct 25, 2022
ConnectedUNets++: Mass Segmentation from Whole Mammographic Images

Prithul Sarker, Sushmita Sarker, George Bebis et al.

Deep learning has made a breakthrough in medical image segmentation in recent years due to its ability to extract high-level features without the need for prior knowledge. In this context, U-Net is one of the most advanced medical image segmentation models, with promising results in mammography. Despite its excellent overall performance in segmenting multimodal medical images, the traditional U-Net structure appears to be inadequate in various ways. There are certain U-Net design modifications, such as MultiResUNet, Connected-UNets, and AU-Net, that have improved overall performance in areas where the conventional U-Net architecture appears to be deficient. Following the success of UNet and its variants, we have presented two enhanced versions of the Connected-UNets architecture: ConnectedUNets+ and ConnectedUNets++. In ConnectedUNets+, we have replaced the simple skip connections of Connected-UNets architecture with residual skip connections, while in ConnectedUNets++, we have modified the encoder-decoder structure along with employing residual skip connections. We have evaluated our proposed architectures on two publicly available datasets, the Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) and INbreast.

IVNov 16, 2022
SWIN-SFTNet : Spatial Feature Expansion and Aggregation using Swin Transformer For Whole Breast micro-mass segmentation

Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli et al.

Incorporating various mass shapes and sizes in training deep learning architectures has made breast mass segmentation challenging. Moreover, manual segmentation of masses of irregular shapes is time-consuming and error-prone. Though Deep Neural Network has shown outstanding performance in breast mass segmentation, it fails in segmenting micro-masses. In this paper, we propose a novel U-net-shaped transformer-based architecture, called Swin-SFTNet, that outperforms state-of-the-art architectures in breast mammography-based micro-mass segmentation. Firstly to capture the global context, we designed a novel Spatial Feature Expansion and Aggregation Block(SFEA) that transforms sequential linear patches into a structured spatial feature. Next, we combine it with the local linear features extracted by the swin transformer block to improve overall accuracy. We also incorporate a novel embedding loss that calculates similarities between linear feature embeddings of the encoder and decoder blocks. With this approach, we achieve higher segmentation dice over the state-of-the-art by 3.10% on CBIS-DDSM, 3.81% on InBreast, and 3.13% on CBIS pre-trained model on the InBreast test data set.

IVAug 11, 2023
Revolutionizing Space Health (Swin-FSR): Advancing Super-Resolution of Fundus Images for SANS Visual Assessment Technology

Khondker Fariha Hossain, Sharif Amit Kamran, Joshua Ong et al.

The rapid accessibility of portable and affordable retinal imaging devices has made early differential diagnosis easier. For example, color funduscopy imaging is readily available in remote villages, which can help to identify diseases like age-related macular degeneration (AMD), glaucoma, or pathological myopia (PM). On the other hand, astronauts at the International Space Station utilize this camera for identifying spaceflight-associated neuro-ocular syndrome (SANS). However, due to the unavailability of experts in these locations, the data has to be transferred to an urban healthcare facility (AMD and glaucoma) or a terrestrial station (e.g, SANS) for more precise disease identification. Moreover, due to low bandwidth limits, the imaging data has to be compressed for transfer between these two places. Different super-resolution algorithms have been proposed throughout the years to address this. Furthermore, with the advent of deep learning, the field has advanced so much that x2 and x4 compressed images can be decompressed to their original form without losing spatial information. In this paper, we introduce a novel model called Swin-FSR that utilizes Swin Transformer with spatial and depth-wise attention for fundus image super-resolution. Our architecture achieves Peak signal-to-noise-ratio (PSNR) of 47.89, 49.00 and 45.32 on three public datasets, namely iChallenge-AMD, iChallenge-PM, and G1020. Additionally, we tested the model's effectiveness on a privately held dataset for SANS provided by NASA and achieved comparable results against previous architectures.

IVJun 24, 2022
Feature Representation Learning for Robust Retinal Disease Detection from Optical Coherence Tomography Images

Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli et al.

Ophthalmic images may contain identical-looking pathologies that can cause failure in automated techniques to distinguish different retinal degenerative diseases. Additionally, reliance on large annotated datasets and lack of knowledge distillation can restrict ML-based clinical support systems' deployment in real-world environments. To improve the robustness and transferability of knowledge, an enhanced feature-learning module is required to extract meaningful spatial representations from the retinal subspace. Such a module, if used effectively, can detect unique disease traits and differentiate the severity of such retinal degenerative pathologies. In this work, we propose a robust disease detection architecture with three learning heads, i) A supervised encoder for retinal disease classification, ii) An unsupervised decoder for the reconstruction of disease-specific spatial information, and iii) A novel representation learning module for learning the similarity between encoder-decoder feature and enhancing the accuracy of the model. Our experimental results on two publicly available OCT datasets illustrate that the proposed model outperforms existing state-of-the-art models in terms of accuracy, interpretability, and robustness for out-of-distribution retinal disease detection.

HCOct 12, 2022
VR-SFT: Reproducing Swinging Flashlight Test in Virtual Reality to Detect Relative Afferent Pupillary Defect

Prithul Sarker, Nasif Zaman, Alireza Tavakkoli

The relative afferent asymmetry between two eyes can be diagnosed using swinging flashlight test, also known as the alternating light test. This remains one of the most used clinical tests to this day. Despite the swinging flashlight test's straightforward approach, a number of factors can add variability into the clinical methodology and reduce the measurement's validity and reliability. This includes small and poorly responsive pupils, dark iris, anisocoria, uneven illumination in both eyes. Due to these limitations, the true condition of relative afferent asymmetry may create confusion and various observers may quantify the relative afferent pupillary defect differently. Consequently, the results of the swinging flashlight test are subjective and ambiguous. In order to eliminate the limitations of traditional swinging flashlight test and introduce objectivity, we propose a novel approach to the swinging flashlight exam, VR-SFT, by making use of virtual reality (VR). We suggest that the clinical records of the subjects and the results of VR-SFT are comparable. In this paper, we describe how we exploit the features of immersive VR experience to create a reliable and objective swinging flashlight test.

IVMar 16, 2023
SwinVFTR: A Novel Volumetric Feature-learning Transformer for 3D OCT Fluid Segmentation

Khondker Fariha Hossain, Sharif Amit Kamran, Alireza Tavakkoli et al.

Accurately segmenting fluid in 3D optical coherence tomography (OCT) images is critical for detecting eye diseases but remains challenging. Traditional autoencoder-based methods struggle with resolution loss and information recovery. While transformer-based models improve segmentation, they arent optimized for 3D OCT volumes, which vary by vendor and extraction technique. To address this, we propose SwinVFTR, a transformer architecture for precise fluid segmentation in 3D OCT images. SwinVFTR employs channel-wise volumetric sampling and a shifted window transformer block to improve fluid localization. Moreover, a novel volumetric attention block enhances spatial and depth-wise attention. Trained using multi-class dice loss, SwinVFTR outperforms existing models on Spectralis, Cirrus, and Topcon OCT datasets, achieving mean dice scores of 0.72, 0.59, and 0.68, respectively, along with superior performance in mean intersection-over-union (IOU) and structural similarity (SSIM) metrics.

IVOct 13, 2022
Virtual-Reality based Vestibular Ocular Motor Screening for Concussion Detection using Machine-Learning

Khondker Fariha Hossain, Sharif Amit Kamran, Prithul Sarker et al.

Sport-related concussion (SRC) depends on sensory information from visual, vestibular, and somatosensory systems. At the same time, the current clinical administration of Vestibular/Ocular Motor Screening (VOMS) is subjective and deviates among administrators. Therefore, for the assessment and management of concussion detection, standardization is required to lower the risk of injury and increase the validation among clinicians. With the advancement of technology, virtual reality (VR) can be utilized to advance the standardization of the VOMS, increasing the accuracy of testing administration and decreasing overall false positive rates. In this paper, we experimented with multiple machine learning methods to detect SRC on VR-generated data using VOMS. In our observation, the data generated from VR for smooth pursuit (SP) and the Visual Motion Sensitivity (VMS) tests are highly reliable for concussion detection. Furthermore, we train and evaluate these models, both qualitatively and quantitatively. Our findings show these models can reach high true-positive-rates of around 99.9 percent of symptom provocation on the VR stimuli-based VOMS vs. current clinical manual VOMS.

MED-PHOct 12, 2022
Analysis of Smooth Pursuit Assessment in Virtual Reality and Concussion Detection using BiLSTM

Prithul Sarker, Khondker Fariha Hossain, Isayas Berhe Adhanom et al.

The sport-related concussion (SRC) battery relies heavily upon subjective symptom reporting in order to determine the diagnosis of a concussion. Unfortunately, athletes with SRC may return-to-play (RTP) too soon if they are untruthful of their symptoms. It is critical to provide accurate assessments that can overcome underreporting to prevent further injury. To lower the risk of injury, a more robust and precise method for detecting concussion is needed to produce reliable and objective results. In this paper, we propose a novel approach to detect SRC using long short-term memory (LSTM) recurrent neural network (RNN) architectures from oculomotor data. In particular, we propose a new error metric that incorporates mean squared error in different proportions. The experimental results on the smooth pursuit test of the VR-VOMS dataset suggest that the proposed approach can predict concussion symptoms with higher accuracy compared to symptom provocation on the vestibular ocular motor screening (VOMS).

CVFeb 26, 2024Code
MV-Swin-T: Mammogram Classification with Multi-view Swin Transformer

Sushmita Sarker, Prithul Sarker, George Bebis et al.

Traditional deep learning approaches for breast cancer classification has predominantly concentrated on single-view analysis. In clinical practice, however, radiologists concurrently examine all views within a mammography exam, leveraging the inherent correlations in these views to effectively detect tumors. Acknowledging the significance of multi-view analysis, some studies have introduced methods that independently process mammogram views, either through distinct convolutional branches or simple fusion strategies, inadvertently leading to a loss of crucial inter-view correlations. In this paper, we propose an innovative multi-view network exclusively based on transformers to address challenges in mammographic image classification. Our approach introduces a novel shifted window-based dynamic attention block, facilitating the effective integration of multi-view information and promoting the coherent transfer of this information between views at the spatial feature map level. Furthermore, we conduct a comprehensive comparative analysis of the performance and effectiveness of transformer-based models under diverse settings, employing the CBIS-DDSM and Vin-Dr Mammo datasets. Our code is publicly available at https://github.com/prithuls/MV-Swin-T

18.9LGMay 10
Learning Unified Representations of Normalcy for Time Series Anomaly Detection

Prithul Sarker, Sushmita Sarker, Nicholas G. Murray et al.

The core challenge in unsupervised anomaly detection is identifying abnormal patterns without prior knowledge of their characteristics. While existing methods have addressed aspects of this problem, they often struggle to learn a robust representation of the normal data distribution that is distinct from anomalous patterns. In this paper, we present a novel framework, Unified Unsupervised Anomaly Detection ($\text{U}^2\text{AD}$), that comprehensively addresses anomaly detection in multivariate time series. Our approach learns the underlying data distribution of normal samples by utilizing score-based generative modeling. We introduce a novel time-dependent score network and a unified training objective that together delineate the manifold of normal data while considering both local and global temporal contexts. Reconstruction is then performed via a deterministic sampling process using an ordinary differential equation solver. Our extensive experimental evaluations demonstrate that $\text{U}^2\text{AD}$ not only outperforms current state-of-the-art methods in detection accuracy but also identifies anomalies at significantly earlier stages of their occurrence.

CVFeb 24, 2025Code
Can Score-Based Generative Modeling Effectively Handle Medical Image Classification?

Sushmita Sarker, Prithul Sarker, George Bebis et al.

The remarkable success of deep learning in recent years has prompted applications in medical image classification and diagnosis tasks. While classification models have demonstrated robustness in classifying simpler datasets like MNIST or natural images such as ImageNet, this resilience is not consistently observed in complex medical image datasets where data is more scarce and lacks diversity. Moreover, previous findings on natural image datasets have indicated a potential trade-off between data likelihood and classification accuracy. In this study, we explore the use of score-based generative models as classifiers for medical images, specifically mammographic images. Our findings suggest that our proposed generative classifier model not only achieves superior classification results on CBIS-DDSM, INbreast and Vin-Dr Mammo datasets, but also introduces a novel approach to image classification in a broader context. Our code is publicly available at https://github.com/sushmitasarker/sgc_for_medical_image_classification

CVMay 20, 2024
A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation

Sushmita Sarker, Prithul Sarker, Gunner Stone et al.

Point cloud analysis has a wide range of applications in many areas such as computer vision, robotic manipulation, and autonomous driving. While deep learning has achieved remarkable success on image-based tasks, there are many unique challenges faced by deep neural networks in processing massive, unordered, irregular and noisy 3D points. To stimulate future research, this paper analyzes recent progress in deep learning methods employed for point cloud processing and presents challenges and potential directions to advance this field. It serves as a comprehensive review on two major tasks in 3D point cloud processing-- namely, 3D shape classification and semantic segmentation.

CVSep 21, 2025
Point-RTD: Replaced Token Denoising for Pretraining Transformer Models on Point Clouds

Gunner Stone, Youngsook Choi, Alireza Tavakkoli et al.

Pre-training strategies play a critical role in advancing the performance of transformer-based models for 3D point cloud tasks. In this paper, we introduce Point-RTD (Replaced Token Denoising), a novel pretraining strategy designed to improve token robustness through a corruption-reconstruction framework. Unlike traditional mask-based reconstruction tasks that hide data segments for later prediction, Point-RTD corrupts point cloud tokens and leverages a discriminator-generator architecture for denoising. This shift enables more effective learning of structural priors and significantly enhances model performance and efficiency. On the ShapeNet dataset, Point-RTD reduces reconstruction error by over 93% compared to PointMAE, and achieves more than 14x lower Chamfer Distance on the test set. Our method also converges faster and yields higher classification accuracy on ShapeNet, ModelNet10, and ModelNet40 benchmarks, clearly outperforming the baseline Point-MAE framework in every case.

CVSep 21, 2025
Guided and Unguided Conditional Diffusion Mechanisms for Structured and Semantically-Aware 3D Point Cloud Generation

Gunner Stone, Sushmita Sarker, Alireza Tavakkoli

Generating realistic 3D point clouds is a fundamental problem in computer vision with applications in remote sensing, robotics, and digital object modeling. Existing generative approaches primarily capture geometry, and when semantics are considered, they are typically imposed post hoc through external segmentation or clustering rather than integrated into the generative process itself. We propose a diffusion-based framework that embeds per-point semantic conditioning directly within generation. Each point is associated with a conditional variable corresponding to its semantic label, which guides the diffusion dynamics and enables the joint synthesis of geometry and semantics. This design produces point clouds that are both structurally coherent and segmentation-aware, with object parts explicitly represented during synthesis. Through a comparative analysis of guided and unguided diffusion processes, we demonstrate the significant impact of conditional variables on diffusion dynamics and generation quality. Extensive experiments validate the efficacy of our approach, producing detailed and accurate 3D point clouds tailored to specific parts and features.

CVSep 18, 2025
RaceGAN: A Framework for Preserving Individuality while Converting Racial Information for Image-to-Image Translation

Mst Tasnim Pervin, George Bebis, Fang Jiang et al.

Generative adversarial networks (GANs) have demonstrated significant progress in unpaired image-to-image translation in recent years for several applications. CycleGAN was the first to lead the way, although it was restricted to a pair of domains. StarGAN overcame this constraint by tackling image-to-image translation across various domains, although it was not able to map in-depth low-level style changes for these domains. Style mapping via reference-guided image synthesis has been made possible by the innovations of StarGANv2 and StyleGAN. However, these models do not maintain individuality and need an extra reference image in addition to the input. Our study aims to translate racial traits by means of multi-domain image-to-image translation. We present RaceGAN, a novel framework capable of mapping style codes over several domains during racial attribute translation while maintaining individuality and high level semantics without relying on a reference image. RaceGAN outperforms other models in translating racial features (i.e., Asian, White, and Black) when tested on Chicago Face Dataset. We also give quantitative findings utilizing InceptionReNetv2-based classification to demonstrate the effectiveness of our racial translation. Moreover, we investigate how well the model partitions the latent space into distinct clusters of faces for each ethnic group.

SPOct 17, 2021
ECG-ATK-GAN: Robustness against Adversarial Attacks on ECGs using Conditional Generative Adversarial Networks

Khondker Fariha Hossain, Sharif Amit Kamran, Alireza Tavakkoli et al.

Automating arrhythmia detection from ECG requires a robust and trusted system that retains high accuracy under electrical disturbances. Many machine learning approaches have reached human-level performance in classifying arrhythmia from ECGs. However, these architectures are vulnerable to adversarial attacks, which can misclassify ECG signals by decreasing the model's accuracy. Adversarial attacks are small crafted perturbations injected in the original data which manifest the out-of-distribution shifts in signal to misclassify the correct class. Thus, security concerns arise for false hospitalization and insurance fraud abusing these perturbations. To mitigate this problem, we introduce the first novel Conditional Generative Adversarial Network (GAN), robust against adversarial attacked ECG signals and retaining high accuracy. Our architecture integrates a new class-weighted objective function for adversarial perturbation identification and new blocks for discerning and combining out-of-distribution shifts in signals in the learning process for accurately classifying various arrhythmia types. Furthermore, we benchmark our architecture on six different white and black-box attacks and compare them with other recently proposed arrhythmia classification models on two publicly available ECG arrhythmia datasets. The experiment confirms that our model is more robust against such adversarial attacks for classifying arrhythmia with high accuracy.

CRAug 4, 2021
Semi-supervised Conditional GAN for Simultaneous Generation and Detection of Phishing URLs: A Game theoretic Perspective

Sharif Amit Kamran, Shamik Sengupta, Alireza Tavakkoli

Spear Phishing is a type of cyber-attack where the attacker sends hyperlinks through email on well-researched targets. The objective is to obtain sensitive information by imitating oneself as a trustworthy website. In recent times, deep learning has become the standard for defending against such attacks. However, these architectures were designed with only defense in mind. Moreover, the attacker's perspective and motivation are absent while creating such models. To address this, we need a game-theoretic approach to understand the perspective of the attacker (Hacker) and the defender (Phishing URL detector). We propose a Conditional Generative Adversarial Network with novel training strategy for real-time phishing URL detection. Additionally, we train our architecture in a semi-supervised manner to distinguish between adversarial and real examples, along with detecting malicious and benign URLs. We also design two games between the attacker and defender in training and deployment settings by utilizing the game-theoretic perspective. Our experiments confirm that the proposed architecture surpasses recent state-of-the-art architectures for phishing URLs detection.

LGJul 16, 2021
ECG-Adv-GAN: Detecting ECG Adversarial Examples with Conditional Generative Adversarial Networks

Khondker Fariha Hossain, Sharif Amit Kamran, Alireza Tavakkoli et al.

Electrocardiogram (ECG) acquisition requires an automated system and analysis pipeline for understanding specific rhythm irregularities. Deep neural networks have become a popular technique for tracing ECG signals, outperforming human experts. Despite this, convolutional neural networks are susceptible to adversarial examples that can misclassify ECG signals and decrease the model's precision. Moreover, they do not generalize well on the out-of-distribution dataset. The GAN architecture has been employed in recent works to synthesize adversarial ECG signals to increase existing training data. However, they use a disjointed CNN-based classification architecture to detect arrhythmia. Till now, no versatile architecture has been proposed that can detect adversarial examples and classify arrhythmia simultaneously. To alleviate this, we propose a novel Conditional Generative Adversarial Network to simultaneously generate ECG signals for different categories and detect cardiac abnormalities. Moreover, the model is conditioned on class-specific ECG signals to synthesize realistic adversarial examples. Consequently, we compare our architecture and show how it outperforms other classification models in normal/abnormal ECG signal detection by benchmarking real world and adversarial signals.

IVApr 14, 2021
VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers

Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli et al.

In Fluorescein Angiography (FA), an exogenous dye is injected in the bloodstream to image the vascular structure of the retina. The injected dye can cause adverse reactions such as nausea, vomiting, anaphylactic shock, and even death. In contrast, color fundus imaging is a non-invasive technique used for photographing the retina but does not have sufficient fidelity for capturing its vascular structure. The only non-invasive method for capturing retinal vasculature is optical coherence tomography-angiography (OCTA). However, OCTA equipment is quite expensive, and stable imaging is limited to small areas on the retina. In this paper, we propose a novel conditional generative adversarial network (GAN) capable of simultaneously synthesizing FA images from fundus photographs while predicting retinal degeneration. The proposed system has the benefit of addressing the problem of imaging retinal vasculature in a non-invasive manner as well as predicting the existence of retinal abnormalities. We use a semi-supervised approach to train our GAN using multiple weighted losses on different modalities of data. Our experiments validate that the proposed architecture exceeds recent state-of-the-art generative networks for fundus-to-angiography synthesis. Moreover, our vision transformer-based discriminators generalize quite well on out-of-distribution data sets for retinal disease prediction.

IVJan 3, 2021
RV-GAN: Segmenting Retinal Vascular Structure in Fundus Photographs using a Novel Multi-scale Generative Adversarial Network

Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli et al.

High fidelity segmentation of both macro and microvascular structure of the retina plays a pivotal role in determining degenerative retinal diseases, yet it is a difficult problem. Due to successive resolution loss in the encoding phase combined with the inability to recover this lost information in the decoding phase, autoencoding based segmentation approaches are limited in their ability to extract retinal microvascular structure. We propose RV-GAN, a new multi-scale generative architecture for accurate retinal vessel segmentation to alleviate this. The proposed architecture uses two generators and two multi-scale autoencoding discriminators for better microvessel localization and segmentation. In order to avoid the loss of fidelity suffered by traditional GAN-based segmentation systems, we introduce a novel weighted feature matching loss. This new loss incorporates and prioritizes features from the discriminator's decoder over the encoder. Doing so combined with the fact that the discriminator's decoder attempts to determine real or fake images at the pixel level better preserves macro and microvascular structure. By combining reconstruction and weighted feature matching loss, the proposed architecture achieves an area under the curve (AUC) of 0.9887, 0.9914, and 0.9887 in pixel-wise segmentation of retinal vasculature from three publicly available datasets, namely DRIVE, CHASE-DB1, and STARE, respectively. Additionally, RV-GAN outperforms other architectures in two additional relevant metrics, mean intersection-over-union (Mean-IOU) and structural similarity measure (SSIM).

ROSep 1, 2020
Control Framework for a Hybrid-steel Bridge Inspection Robot

Hoang-Dung Bui, Son Nguyen, U-H. Billah et al.

Autonomous navigation of steel bridge inspection robots is essential for proper maintenance. The majority of existing robotic solutions for bridge inspection require human intervention to assist in the control and navigation. In this paper, a control system framework has been proposed for a previously designed ARA robot [1], which facilitates autonomous real-time navigation and minimizes human involvement. The mechanical design and control framework of ARA robot enables two different configurations, namely the mobile and inch-worm transformation. In addition, a switching control was developed with 3D point clouds of steel surfaces as the input which allows the robot to switch between mobile and inch-worm transformation. The surface availability algorithm (considers plane, area, and height) of the switching control enables the robot to perform inch-worm jumps autonomously. Themobiletransformationallows the robot to move on continuous steel surfaces and perform visual inspection of steel bridge structures. Practical experiments on actual steel bridge structures highlight the effective performance of ARA robot with the proposed control framework for autonomous navigation during a visual inspection of steel bridges.

IVJul 17, 2020
Attention2AngioGAN: Synthesizing Fluorescein Angiography from Retinal Fundus Images using Generative Adversarial Networks

Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli et al.

Fluorescein Angiography (FA) is a technique that employs the designated camera for Fundus photography incorporating excitation and barrier filters. FA also requires fluorescein dye that is injected intravenously, which might cause adverse effects ranging from nausea, vomiting to even fatal anaphylaxis. Currently, no other fast and non-invasive technique exists that can generate FA without coupling with Fundus photography. To eradicate the need for an invasive FA extraction procedure, we introduce an Attention-based Generative network that can synthesize Fluorescein Angiography from Fundus images. The proposed gan incorporates multiple attention based skip connections in generators and comprises novel residual blocks for both generators and discriminators. It utilizes reconstruction, feature-matching, and perceptual loss along with adversarial training to produces realistic Angiograms that is hard for experts to distinguish from real ones. Our experiments confirm that the proposed architecture surpasses recent state-of-the-art generative networks for fundus-to-angio translation task.

IVMay 16, 2020
Improving Robustness using Joint Attention Network For Detecting Retinal Degeneration From Optical Coherence Tomography Images

Sharif Amit Kamran, Alireza Tavakkoli, Stewart Lee Zuckerbrod

Noisy data and the similarity in the ocular appearances caused by different ophthalmic pathologies pose significant challenges for an automated expert system to accurately detect retinal diseases. In addition, the lack of knowledge transferability and the need for unreasonably large datasets limit clinical application of current machine learning systems. To increase robustness, a better understanding of how the retinal subspace deformations lead to various levels of disease severity needs to be utilized for prioritizing disease-specific model details. In this paper we propose the use of disease-specific feature representation as a novel architecture comprised of two joint networks -- one for supervised encoding of disease model and the other for producing attention maps in an unsupervised manner to retain disease specific spatial information. Our experimental results on publicly available datasets show the proposed joint-network significantly improves the accuracy and robustness of state-of-the-art retinal disease classification networks on unseen datasets.

IVMay 11, 2020
Fundus2Angio: A Conditional GAN Architecture for Generating Fluorescein Angiography Images from Retinal Fundus Photography

Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli et al.

Carrying out clinical diagnosis of retinal vascular degeneration using Fluorescein Angiography (FA) is a time consuming process and can pose significant adverse effects on the patient. Angiography requires insertion of a dye that may cause severe adverse effects and can even be fatal. Currently, there are no non-invasive systems capable of generating Fluorescein Angiography images. However, retinal fundus photography is a non-invasive imaging technique that can be completed in a few seconds. In order to eliminate the need for FA, we propose a conditional generative adversarial network (GAN) to translate fundus images to FA images. The proposed GAN consists of a novel residual block capable of generating high quality FA images. These images are important tools in the differential diagnosis of retinal diseases without the need for invasive procedure with possible side effects. Our experiments show that the proposed architecture outperforms other state-of-the-art generative networks. Furthermore, our proposed model achieves better qualitative results indistinguishable from real angiograms.

IVOct 17, 2019
A Parametric Perceptual Deficit Modeling and Diagnostics Framework for Retina Damage using Mixed Reality

Prithul Aniruddha, Nasif Zaman, Alireza Tavakkoli et al.

Age-related Macular Degeneration (AMD) is a progressive visual impairment affecting millions of individuals. Since there is no current treatment for the disease, the only means of improving the lives of individuals suffering from the disease is via assistive technologies. In this paper we propose a novel and effective methodology to accurately generate a parametric model for the perceptual deficit caused by the physiological deterioration of a patient's retina due to AMD. Based on the parameters of the model, a mechanism is developed to simulate the patient's perception as a result of the disease. This simulation can effectively deliver the perceptual impact and its progression to the patient's eye doctor. In addition, we propose a mixed-reality apparatus and interface to allow the patient recover functional vision and to compensate for the perceptual loss caused by the physiological damage. The results obtained by the proposed approach show the superiority of our framework over the state-of-the-art low-vision systems.

IVOct 13, 2019
Optic-Net: A Novel Convolutional Neural Network for Diagnosis of Retinal Diseases from Optical Tomography Images

Sharif Amit Kamran, Sourajit Saha, Ali Shihab Sabbir et al.

Diagnosing different retinal diseases from Spectral Domain Optical Coherence Tomography (SD-OCT) images is a challenging task. Different automated approaches such as image processing, machine learning and deep learning algorithms have been used for early detection and diagnosis of retinal diseases. Unfortunately, these are prone to error and computational inefficiency, which requires further intervention from human experts. In this paper, we propose a novel convolution neural network architecture to successfully distinguish between different degeneration of retinal layers and their underlying causes. The proposed novel architecture outperforms other classification models while addressing the issue of gradient explosion. Our approach reaches near perfect accuracy of 99.8% and 100% for two separately available Retinal SD-OCT data-set respectively. Additionally, our architecture predicts retinal diseases in real time while outperforming human diagnosticians.

CVOct 11, 2019
An Automatic Digital Terrain Generation Technique for Terrestrial Sensing and Virtual Reality Applications

Lee Easson, Alireza Tavakkoli, Jonathan Greenberg

The identification and modeling of the terrain from point cloud data is an important component of Terrestrial Remote Sensing (TRS) applications. The main focus in terrain modeling is capturing details of complex geological features of landforms. Traditional terrain modeling approaches rely on the user to exert control over terrain features. However, relying on the user input to manually develop the digital terrain becomes intractable when considering the amount of data generated by new remote sensing systems capable of producing massive aerial and ground-based point clouds from scanned environments. This article provides a novel terrain modeling technique capable of automatically generating accurate and physically realistic Digital Terrain Models (DTM) from a variety of point cloud data. The proposed method runs efficiently on large-scale point cloud data with real-time performance over large segments of terrestrial landforms. Moreover, generated digital models are designed to effectively render within a Virtual Reality (VR) environment in real time. The paper concludes with an in-depth discussion of possible research directions and outstanding technical and scientific challenges to improve the proposed approach.

ROMar 5, 2019
Lidar-Monocular Visual Odometry with Genetic Algorithm for Parameter Optimization

Adarsh Sehgal, Ashutosh Singandhupe, Hung Manh La et al.

Lidar-Monocular Visual Odometry (LIMO), a odometry estimation algorithm, combines camera and LIght Detection And Ranging sensor (LIDAR) for visual localization by tracking camera features as well as features from LIDAR measurements, and it estimates the motion using Bundle Adjustment based on robust key frames. For rejecting the outliers, LIMO uses semantic labelling and weights of the vegetation landmarks. A drawback of LIMO as well as many other odometry estimation algorithms is that it has many parameters that need to be manually adjusted according to the dynamic changes in the environment in order to decrease the translational errors. In this paper, we present and argue the use of Genetic Algorithm to optimize parameters with reference to LIMO and maximize LIMO's localization and motion estimation performance. We evaluate our approach on the well known KITTI odometry dataset and show that the genetic algorithm helps LIMO to reduce translation error in different datasets.