IVMar 1, 2023
A Deep Neural Architecture for Harmonizing 3-D Input Data Analysis and Decision Making in Medical ImagingDimitrios Kollias, Anastasios Arsenos, Stefanos Kollias
Harmonizing the analysis of data, especially of 3-D image volumes, consisting of different number of slices and annotated per volume, is a significant problem in training and using deep neural networks in various applications, including medical imaging. Moreover, unifying the decision making of the networks over different input datasets is crucial for the generation of rich data-driven knowledge and for trusted usage in the applications. This paper presents a new deep neural architecture, named RACNet, which includes routing and feature alignment steps and effectively handles different input lengths and single annotations of the 3-D image inputs, whilst providing highly accurate decisions. In addition, through latent variable extraction from the trained RACNet, a set of anchors are generated providing further insight on the network's decision making. These can be used to enrich and unify data-driven knowledge extracted from different datasets. An extensive experimental study illustrates the above developments, focusing on COVID-19 diagnosis through analysis of 3-D chest CT scans from databases generated in different countries and medical centers.
IVJun 9, 2022
AI-MIA: COVID-19 Detection & Severity Analysis through Medical ImagingDimitrios Kollias, Anastasios Arsenos, Stefanos Kollias
This paper presents the baseline approach for the organized 2nd Covid-19 Competition, occurring in the framework of the AIMIA Workshop in the European Conference on Computer Vision (ECCV 2022). It presents the COV19-CT-DB database which is annotated for COVID-19 detction, consisting of about 7,700 3-D CT scans. Part of the database consisting of Covid-19 cases is further annotated in terms of four Covid-19 severity conditions. We have split the database and the latter part of it in training, validation and test datasets. The former two datasets are used for training and validation of machine learning models, while the latter will be used for evaluation of the developed models. The baseline approach consists of a deep learning approach, based on a CNN-RNN network and report its performance on the COVID19-CT-DB database.
CVMar 1, 2023
MMA-MRNNet: Harnessing Multiple Models of Affect and Dynamic Masked RNN for Precise Facial Expression Intensity EstimationDimitrios Kollias, Andreas Psaroudakis, Anastasios Arsenos et al.
This paper presents MMA-MRNNet, a novel deep learning architecture for dynamic multi-output Facial Expression Intensity Estimation (FEIE) from video data. Traditional approaches to this task often rely on complex 3-D CNNs, which require extensive pre-training and assume that facial expressions are uniformly distributed across all frames of a video. These methods struggle to handle videos of varying lengths, often resorting to ad-hoc strategies that either discard valuable information or introduce bias. MMA-MRNNet addresses these challenges through a two-stage process. First, the Multiple Models of Affect (MMA) extractor component is a Multi-Task Learning CNN that concurrently estimates valence-arousal, recognizes basic facial expressions, and detects action units in each frame. These representations are then processed by a Masked RNN component, which captures temporal dependencies and dynamically updates weights according to the true length of the input video, ensuring that only the most relevant features are used for the final prediction. The proposed unimodal non-ensemble learning MMA-MRNNet was evaluated on the Hume-Reaction dataset and demonstrated significantly superior performance, surpassing state-of-the-art methods by a wide margin, regardless of whether they were unimodal, multimodal, or ensemble approaches. Finally, we demonstrated the effectiveness of the MMA component of our proposed method across multiple in-the-wild datasets, where it consistently outperformed all state-of-the-art methods across various metrics.
IVJul 22, 2024
SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 DetectionDimitrios Kollias, Anastasios Arsenos, James Wingate et al.
This paper presents a new approach for effective segmentation of images that can be integrated into any model and methodology; the paradigm that we choose is classification of medical images (3-D chest CT scans) for Covid-19 detection. Our approach includes a combination of vision-language models that segment the CT scans, which are then fed to a deep neural architecture, named RACNet, for Covid-19 detection. In particular, a novel framework, named SAM2CLIP2SAM, is introduced for segmentation that leverages the strengths of both Segment Anything Model (SAM) and Contrastive Language-Image Pre-Training (CLIP) to accurately segment the right and left lungs in CT scans, subsequently feeding these segmented outputs into RACNet for classification of COVID-19 and non-COVID-19 cases. At first, SAM produces multiple part-based segmentation masks for each slice in the CT scan; then CLIP selects only the masks that are associated with the regions of interest (ROIs), i.e., the right and left lungs; finally SAM is given these ROIs as prompts and generates the final segmentation mask for the lungs. Experiments are presented across two Covid-19 annotated databases which illustrate the improved performance obtained when our method has been used for segmentation of the CT scans.
IVMar 4, 2024
Domain adaptation, Explainability & Fairness in AI for Medical Image Analysis: Diagnosis of COVID-19 based on 3-D Chest CT-scansDimitrios Kollias, Anastasios Arsenos, Stefanos Kollias
The paper presents the DEF-AI-MIA COV19D Competition, which is organized in the framework of the 'Domain adaptation, Explainability, Fairness in AI for Medical Image Analysis (DEF-AI-MIA)' Workshop of the 2024 Computer Vision and Pattern Recognition (CVPR) Conference. The Competition is the 4th in the series, following the first three Competitions held in the framework of ICCV 2021, ECCV 2022 and ICASSP 2023 International Conferences respectively. It includes two Challenges on: i) Covid-19 Detection and ii) Covid-19 Domain Adaptation. The Competition use data from COV19-CT-DB database, which is described in the paper and includes a large number of chest CT scan series. Each chest CT scan series consists of a sequence of 2-D CT slices, the number of which is between 50 and 700. Training, validation and test datasets have been extracted from COV19-CT-DB and provided to the participants in both Challenges. The paper presents the baseline models used in the Challenges and the performance which was obtained respectively.
IVMar 10, 2024
COVID-19 Computer-aided Diagnosis through AI-assisted CT Imaging Analysis: Deploying a Medical AI SystemDemetris Gerogiannis, Anastasios Arsenos, Dimitrios Kollias et al.
Computer-aided diagnosis (CAD) systems stand out as potent aids for physicians in identifying the novel Coronavirus Disease 2019 (COVID-19) through medical imaging modalities. In this paper, we showcase the integration and reliable and fast deployment of a state-of-the-art AI system designed to automatically analyze CT images, offering infection probability for the swift detection of COVID-19. The suggested system, comprising both classification and segmentation components, is anticipated to reduce physicians' detection time and enhance the overall efficiency of COVID-19 detection. We successfully surmounted various challenges, such as data discrepancy and anonymisation, testing the time-effectiveness of the model, and data security, enabling reliable and scalable deployment of the system on both cloud and edge environments. Additionally, our AI system assigns a probability of infection to each 3D CT scan and enhances explainability through anchor set similarity, facilitating timely confirmation and segregation of infected patients by physicians.
CVMay 10, 2024
Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance EstimationVasileios Karampinis, Anastasios Arsenos, Orfeas Filippopoulos et al.
In the last twenty years, unmanned aerial vehicles (UAVs) have garnered growing interest due to their expanding applications in both military and civilian domains. Detecting non-cooperative aerial vehicles with efficiency and estimating collisions accurately are pivotal for achieving fully autonomous aircraft and facilitating Advanced Air Mobility (AAM). This paper presents a deep-learning framework that utilizes optical sensors for the detection, tracking, and distance estimation of non-cooperative aerial vehicles. In implementing this comprehensive sensing framework, the availability of depth information is essential for enabling autonomous aerial vehicles to perceive and navigate around obstacles. In this work, we propose a method for estimating the distance information of a detected aerial object in real time using only the input of a monocular camera. In order to train our deep learning components for the object detection, tracking and depth estimation tasks we utilize the Amazon Airborne Object Tracking (AOT) Dataset. In contrast to previous approaches that integrate the depth estimation module into the object detector, our method formulates the problem as image-to-image translation. We employ a separate lightweight encoder-decoder network for efficient and robust depth estimation. In a nutshell, the object detection module identifies and localizes obstacles, conveying this information to both the tracking module for monitoring obstacle movement and the depth estimation module for calculating distances. Our approach is evaluated on the Airborne Object Tracking (AOT) dataset which is the largest (to the best of our knowledge) air-to-air airborne object dataset.
CVMar 12, 2024
Uncertainty-guided Contrastive Learning for Single Source Domain GeneralisationAnastasios Arsenos, Dimitrios Kollias, Evangelos Petrongonas et al.
In the context of single domain generalisation, the objective is for models that have been exclusively trained on data from a single domain to demonstrate strong performance when confronted with various unfamiliar domains. In this paper, we introduce a novel model referred to as Contrastive Uncertainty Domain Generalisation Network (CUDGNet). The key idea is to augment the source capacity in both input and label spaces through the fictitious domain generator and jointly learn the domain invariant representation of each class through contrastive learning. Extensive experiments on two Single Source Domain Generalisation (SSDG) datasets demonstrate the effectiveness of our approach, which surpasses the state-of-the-art single-DG methods by up to $7.08\%$. Our method also provides efficient uncertainty estimation at inference time from a single forward pass through the generator subnetwork.
CVMay 10, 2024
Common Corruptions for Enhancing and Evaluating Robustness in Air-to-Air Visual Object DetectionAnastasios Arsenos, Vasileios Karampinis, Evangelos Petrongonas et al.
The main barrier to achieving fully autonomous flights lies in autonomous aircraft navigation. Managing non-cooperative traffic presents the most important challenge in this problem. The most efficient strategy for handling non-cooperative traffic is based on monocular video processing through deep learning models. This study contributes to the vision-based deep learning aircraft detection and tracking literature by investigating the impact of data corruption arising from environmental and hardware conditions on the effectiveness of these methods. More specifically, we designed $7$ types of common corruptions for camera inputs taking into account real-world flight conditions. By applying these corruptions to the Airborne Object Tracking (AOT) dataset we constructed the first robustness benchmark dataset named AOT-C for air-to-air aerial object detection. The corruptions included in this dataset cover a wide range of challenging conditions such as adverse weather and sensor noise. The second main contribution of this letter is to present an extensive experimental evaluation involving $8$ diverse object detectors to explore the degradation in the performance under escalating levels of corruptions (domain shifts). Based on the evaluation results, the key observations that emerge are the following: 1) One-stage detectors of the YOLO family demonstrate better robustness, 2) Transformer-based and multi-stage detectors like Faster R-CNN are extremely vulnerable to corruptions, 3) Robustness against corruptions is related to the generalization ability of models. The third main contribution is to present that finetuning on our augmented synthetic data results in improvements in the generalisation ability of the object detector in real-world flight experiments.
CVSep 11, 2025
Model-Agnostic Open-Set Air-to-Air Visual Object Detection for Reliable UAV PerceptionSpyridon Loukovitis, Anastasios Arsenos, Vasileios Karampinis et al.
Open-set detection is crucial for robust UAV autonomy in air-to-air object detection under real-world conditions. Traditional closed-set detectors degrade significantly under domain shifts and flight data corruption, posing risks to safety-critical applications. We propose a novel, model-agnostic open-set detection framework designed specifically for embedding-based detectors. The method explicitly handles unknown object rejection while maintaining robustness against corrupted flight data. It estimates semantic uncertainty via entropy modeling in the embedding space and incorporates spectral normalization and temperature scaling to enhance open-set discrimination. We validate our approach on the challenging AOT aerial benchmark and through extensive real-world flight tests. Comprehensive ablation studies demonstrate consistent improvements over baseline methods, achieving up to a 10\% relative AUROC gain compared to standard YOLO-based detectors. Additionally, we show that background rejection further strengthens robustness without compromising detection accuracy, making our solution particularly well-suited for reliable UAV perception in dynamic air-to-air environments.
IVJun 1, 2024
Complex Style Image Transformations for Domain Generalization in Medical ImagesNikolaos Spanos, Anastasios Arsenos, Paraskevi-Antonia Theofilou et al.
The absence of well-structured large datasets in medical computer vision results in decreased performance of automated systems and, especially, of deep learning models. Domain generalization techniques aim to approach unknown domains from a single data source. In this paper we introduce a novel framework, named CompStyle, which leverages style transfer and adversarial training, along with high-level input complexity augmentation to effectively expand the domain space and address unknown distributions. State-of-the-art style transfer methods depend on the existence of subdomains within the source dataset. However, this can lead to an inherent dataset bias in the image creation. Input-level augmentation can provide a solution to this problem by widening the domain space in the source dataset and boost performance on out-of-domain distributions. We provide results from experiments on semantic segmentation on prostate data and corruption robustness on cardiac data which demonstrate the effectiveness of our approach. Our method increases performance in both tasks, without added cost to training time or resources.
IVJun 14, 2021
MIA-COV19D: COVID-19 Detection through 3-D Chest CT Image AnalysisDimitrios Kollias, Anastasios Arsenos, Levon Soukissian et al.
Early and reliable COVID-19 diagnosis based on chest 3-D CT scans can assist medical specialists in vital circumstances. Deep learning methodologies constitute a main approach for chest CT scan analysis and disease prediction. However, large annotated databases are necessary for developing deep learning models that are able to provide COVID-19 diagnosis across various medical environments in different countries. Due to privacy issues, publicly available COVID-19 CT datasets are highly difficult to obtain, which hinders the research and development of AI-enabled diagnosis methods of COVID-19 based on CT scans. In this paper we present the COV19-CT-DB database which is annotated for COVID-19, consisting of about 5,000 3-D CT scans, We have split the database in training, validation and test datasets. The former two datasets can be used for training and validation of machine learning models, while the latter will be used for evaluation of the developed models. We also present a deep learning approach, based on a CNN-RNN network and report its performance on the COVID19-CT-DB database.