CVNov 8, 2023Code
Self-Supervised Learning for Visual Relationship Detection through Masked Bounding Box ReconstructionZacharias Anastasakis, Dimitrios Mallis, Markos Diomataris et al.
We present a novel self-supervised approach for representation learning, particularly for the task of Visual Relationship Detection (VRD). Motivated by the effectiveness of Masked Image Modeling (MIM), we propose Masked Bounding Box Reconstruction (MBBR), a variation of MIM where a percentage of the entities/objects within a scene are masked and subsequently reconstructed based on the unmasked objects. The core idea is that, through object-level masked modeling, the network learns context-aware representations that capture the interaction of objects within a scene and thus are highly predictive of visual object relationships. We extensively evaluate learned representations, both qualitatively and quantitatively, in a few-shot setting and demonstrate the efficacy of MBBR for learning robust visual representations, particularly tailored for VRD. The proposed method is able to surpass state-of-the-art VRD methods on the Predicate Detection (PredDet) evaluation setting, using only a few annotated samples. We make our code available at https://github.com/deeplab-ai/SelfSupervisedVRD.
IVMar 1, 2023
A Deep Neural Architecture for Harmonizing 3-D Input Data Analysis and Decision Making in Medical ImagingDimitrios Kollias, Anastasios Arsenos, Stefanos Kollias
Harmonizing the analysis of data, especially of 3-D image volumes, consisting of different number of slices and annotated per volume, is a significant problem in training and using deep neural networks in various applications, including medical imaging. Moreover, unifying the decision making of the networks over different input datasets is crucial for the generation of rich data-driven knowledge and for trusted usage in the applications. This paper presents a new deep neural architecture, named RACNet, which includes routing and feature alignment steps and effectively handles different input lengths and single annotations of the 3-D image inputs, whilst providing highly accurate decisions. In addition, through latent variable extraction from the trained RACNet, a set of anchors are generated providing further insight on the network's decision making. These can be used to enrich and unify data-driven knowledge extracted from different datasets. An extensive experimental study illustrates the above developments, focusing on COVID-19 diagnosis through analysis of 3-D chest CT scans from databases generated in different countries and medical centers.
IVJun 9, 2022
AI-MIA: COVID-19 Detection & Severity Analysis through Medical ImagingDimitrios Kollias, Anastasios Arsenos, Stefanos Kollias
This paper presents the baseline approach for the organized 2nd Covid-19 Competition, occurring in the framework of the AIMIA Workshop in the European Conference on Computer Vision (ECCV 2022). It presents the COV19-CT-DB database which is annotated for COVID-19 detction, consisting of about 7,700 3-D CT scans. Part of the database consisting of Covid-19 cases is further annotated in terms of four Covid-19 severity conditions. We have split the database and the latter part of it in training, validation and test datasets. The former two datasets are used for training and validation of machine learning models, while the latter will be used for evaluation of the developed models. The baseline approach consists of a deep learning approach, based on a CNN-RNN network and report its performance on the COVID19-CT-DB database.
IVJul 22, 2024
SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 DetectionDimitrios Kollias, Anastasios Arsenos, James Wingate et al.
This paper presents a new approach for effective segmentation of images that can be integrated into any model and methodology; the paradigm that we choose is classification of medical images (3-D chest CT scans) for Covid-19 detection. Our approach includes a combination of vision-language models that segment the CT scans, which are then fed to a deep neural architecture, named RACNet, for Covid-19 detection. In particular, a novel framework, named SAM2CLIP2SAM, is introduced for segmentation that leverages the strengths of both Segment Anything Model (SAM) and Contrastive Language-Image Pre-Training (CLIP) to accurately segment the right and left lungs in CT scans, subsequently feeding these segmented outputs into RACNet for classification of COVID-19 and non-COVID-19 cases. At first, SAM produces multiple part-based segmentation masks for each slice in the CT scan; then CLIP selects only the masks that are associated with the regions of interest (ROIs), i.e., the right and left lungs; finally SAM is given these ROIs as prompts and generates the final segmentation mask for the lungs. Experiments are presented across two Covid-19 annotated databases which illustrate the improved performance obtained when our method has been used for segmentation of the CT scans.
CVOct 14, 2021Code
HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive MediaAnargyros Chatzitofis, Leonidas Saroglou, Prodromos Boutis et al.
We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system. By capturing 2 female and $2$ male professional actors performing various full-body movements and expressions, HUMAN4D provides a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities (jumping, dancing, etc.), along with multi-RGBD (mRGBD), volumetric and audio data. Despite the existence of multi-view color datasets captured with the use of hardware (HW) synchronization, to the best of our knowledge, HUMAN4D is the first and only public resource that provides volumetric depth maps with high synchronization precision due to the use of intra- and inter-sensor HW-SYNC. Moreover, a spatio-temporally aligned scanned and rigged 3D character complements HUMAN4D to enable joint research on time-varying and high-quality dynamic meshes. We provide evaluation baselines by benchmarking HUMAN4D with state-of-the-art human pose estimation and 3D compression methods. For the former, we apply 2D and 3D pose estimation algorithms both on single- and multi-view data cues. For the latter, we benchmark open-source 3D codecs on volumetric data respecting online volumetric video encoding and steady bit-rates. Furthermore, qualitative and quantitative visual comparison between mesh-based volumetric data reconstructed in different qualities showcases the available options with respect to 4D representations. HUMAN4D is introduced to the computer vision and graphics research communities to enable joint research on spatio-temporally aligned pose, volumetric, mRGBD and audio data cues. The dataset and its code are available https://tofis.github.io/myurls/human4d.
IVMar 4, 2024
Domain adaptation, Explainability & Fairness in AI for Medical Image Analysis: Diagnosis of COVID-19 based on 3-D Chest CT-scansDimitrios Kollias, Anastasios Arsenos, Stefanos Kollias
The paper presents the DEF-AI-MIA COV19D Competition, which is organized in the framework of the 'Domain adaptation, Explainability, Fairness in AI for Medical Image Analysis (DEF-AI-MIA)' Workshop of the 2024 Computer Vision and Pattern Recognition (CVPR) Conference. The Competition is the 4th in the series, following the first three Competitions held in the framework of ICCV 2021, ECCV 2022 and ICASSP 2023 International Conferences respectively. It includes two Challenges on: i) Covid-19 Detection and ii) Covid-19 Domain Adaptation. The Competition use data from COV19-CT-DB database, which is described in the paper and includes a large number of chest CT scan series. Each chest CT scan series consists of a sequence of 2-D CT slices, the number of which is between 50 and 700. Training, validation and test datasets have been extracted from COV19-CT-DB and provided to the participants in both Challenges. The paper presents the baseline models used in the Challenges and the performance which was obtained respectively.
IVMar 10, 2024
COVID-19 Computer-aided Diagnosis through AI-assisted CT Imaging Analysis: Deploying a Medical AI SystemDemetris Gerogiannis, Anastasios Arsenos, Dimitrios Kollias et al.
Computer-aided diagnosis (CAD) systems stand out as potent aids for physicians in identifying the novel Coronavirus Disease 2019 (COVID-19) through medical imaging modalities. In this paper, we showcase the integration and reliable and fast deployment of a state-of-the-art AI system designed to automatically analyze CT images, offering infection probability for the swift detection of COVID-19. The suggested system, comprising both classification and segmentation components, is anticipated to reduce physicians' detection time and enhance the overall efficiency of COVID-19 detection. We successfully surmounted various challenges, such as data discrepancy and anonymisation, testing the time-effectiveness of the model, and data security, enabling reliable and scalable deployment of the system on both cloud and edge environments. Additionally, our AI system assigns a probability of infection to each 3D CT scan and enhances explainability through anchor set similarity, facilitating timely confirmation and segregation of infected patients by physicians.
CVMay 10, 2024
Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance EstimationVasileios Karampinis, Anastasios Arsenos, Orfeas Filippopoulos et al.
In the last twenty years, unmanned aerial vehicles (UAVs) have garnered growing interest due to their expanding applications in both military and civilian domains. Detecting non-cooperative aerial vehicles with efficiency and estimating collisions accurately are pivotal for achieving fully autonomous aircraft and facilitating Advanced Air Mobility (AAM). This paper presents a deep-learning framework that utilizes optical sensors for the detection, tracking, and distance estimation of non-cooperative aerial vehicles. In implementing this comprehensive sensing framework, the availability of depth information is essential for enabling autonomous aerial vehicles to perceive and navigate around obstacles. In this work, we propose a method for estimating the distance information of a detected aerial object in real time using only the input of a monocular camera. In order to train our deep learning components for the object detection, tracking and depth estimation tasks we utilize the Amazon Airborne Object Tracking (AOT) Dataset. In contrast to previous approaches that integrate the depth estimation module into the object detector, our method formulates the problem as image-to-image translation. We employ a separate lightweight encoder-decoder network for efficient and robust depth estimation. In a nutshell, the object detection module identifies and localizes obstacles, conveying this information to both the tracking module for monitoring obstacle movement and the depth estimation module for calculating distances. Our approach is evaluated on the Airborne Object Tracking (AOT) dataset which is the largest (to the best of our knowledge) air-to-air airborne object dataset.
CVMar 12, 2024
Uncertainty-guided Contrastive Learning for Single Source Domain GeneralisationAnastasios Arsenos, Dimitrios Kollias, Evangelos Petrongonas et al.
In the context of single domain generalisation, the objective is for models that have been exclusively trained on data from a single domain to demonstrate strong performance when confronted with various unfamiliar domains. In this paper, we introduce a novel model referred to as Contrastive Uncertainty Domain Generalisation Network (CUDGNet). The key idea is to augment the source capacity in both input and label spaces through the fictitious domain generator and jointly learn the domain invariant representation of each class through contrastive learning. Extensive experiments on two Single Source Domain Generalisation (SSDG) datasets demonstrate the effectiveness of our approach, which surpasses the state-of-the-art single-DG methods by up to $7.08\%$. Our method also provides efficient uncertainty estimation at inference time from a single forward pass through the generator subnetwork.
LGApr 26, 2024
Estimating the Robustness Radius for Randomized Smoothing with 100$\times$ Sample EfficiencyEmmanouil Seferis, Stefanos Kollias, Chih-Hong Cheng
Randomized smoothing (RS) has successfully been used to improve the robustness of predictions for deep neural networks (DNNs) by adding random noise to create multiple variations of an input, followed by deciding the consensus. To understand if an RS-enabled DNN is effective in the sampled input domains, it is mandatory to sample data points within the operational design domain, acquire the point-wise certificate regarding robustness radius, and compare it with pre-defined acceptance criteria. Consequently, ensuring that a point-wise robustness certificate for any given data point is obtained relatively cost-effectively is crucial. This work demonstrates that reducing the number of samples by one or two orders of magnitude can still enable the computation of a slightly smaller robustness radius (commonly ~20% radius reduction) with the same confidence. We provide the mathematical foundation for explaining the phenomenon while experimentally showing promising results on the standard CIFAR-10 and ImageNet datasets.
CVMay 10, 2024
Common Corruptions for Enhancing and Evaluating Robustness in Air-to-Air Visual Object DetectionAnastasios Arsenos, Vasileios Karampinis, Evangelos Petrongonas et al.
The main barrier to achieving fully autonomous flights lies in autonomous aircraft navigation. Managing non-cooperative traffic presents the most important challenge in this problem. The most efficient strategy for handling non-cooperative traffic is based on monocular video processing through deep learning models. This study contributes to the vision-based deep learning aircraft detection and tracking literature by investigating the impact of data corruption arising from environmental and hardware conditions on the effectiveness of these methods. More specifically, we designed $7$ types of common corruptions for camera inputs taking into account real-world flight conditions. By applying these corruptions to the Airborne Object Tracking (AOT) dataset we constructed the first robustness benchmark dataset named AOT-C for air-to-air aerial object detection. The corruptions included in this dataset cover a wide range of challenging conditions such as adverse weather and sensor noise. The second main contribution of this letter is to present an extensive experimental evaluation involving $8$ diverse object detectors to explore the degradation in the performance under escalating levels of corruptions (domain shifts). Based on the evaluation results, the key observations that emerge are the following: 1) One-stage detectors of the YOLO family demonstrate better robustness, 2) Transformer-based and multi-stage detectors like Faster R-CNN are extremely vulnerable to corruptions, 3) Robustness against corruptions is related to the generalization ability of models. The third main contribution is to present that finetuning on our augmented synthetic data results in improvements in the generalisation ability of the object detector in real-world flight experiments.
CVNov 27, 2025
Stable-Drift: A Patient-Aware Latent Drift Replay Method for Stabilizing Representations in Continual LearningParaskevi-Antonia Theofilou, Anuhya Thota, Stefanos Kollias et al.
When deep learning models are sequentially trained on new data, they tend to abruptly lose performance on previously learned tasks, a critical failure known as catastrophic forgetting. This challenge severely limits the deployment of AI in medical imaging, where models must continually adapt to data from new hospitals without compromising established diagnostic knowledge. To address this, we introduce a latent drift-guided replay method that identifies and replays samples with high representational instability. Specifically, our method quantifies this instability via latent drift, the change in a sample internal feature representation after naive domain adaptation. To ensure diversity and clinical relevance, we aggregate drift at the patient level, our memory buffer stores the per patient slices exhibiting the greatest multi-layer representation shift. Evaluated on a cross-hospital COVID-19 CT classification task using state-of-the-art CNN and Vision Transformer backbones, our method substantially reduces forgetting compared to naive fine-tuning and random replay. This work highlights latent drift as a practical and interpretable replay signal for advancing robust continual learning in real world medical settings.
LGSep 19, 2025
Randomized Smoothing Meets Vision-Language ModelsEmmanouil Seferis, Changshun Wu, Stefanos Kollias et al.
Randomized smoothing (RS) is one of the prominent techniques to ensure the correctness of machine learning models, where point-wise robustness certificates can be derived analytically. While RS is well understood for classification, its application to generative models is unclear, since their outputs are sequences rather than labels. We resolve this by connecting generative outputs to an oracle classification task and showing that RS can still be enabled: the final response can be classified as a discrete action (e.g., service-robot commands in VLAs), as harmful vs. harmless (content moderation or toxicity detection in VLMs), or even applying oracles to cluster answers into semantically equivalent ones. Provided that the error rate for the oracle classifier comparison is bounded, we develop the theory that associates the number of samples with the corresponding robustness radius. We further derive improved scaling laws analytically relating the certified radius and accuracy to the number of samples, showing that the earlier result of 2 to 3 orders of magnitude fewer samples sufficing with minimal loss remains valid even under weaker assumptions. Together, these advances make robustness certification both well-defined and computationally feasible for state-of-the-art VLMs, as validated against recent jailbreak-style adversarial attacks.
IVJun 1, 2024
Complex Style Image Transformations for Domain Generalization in Medical ImagesNikolaos Spanos, Anastasios Arsenos, Paraskevi-Antonia Theofilou et al.
The absence of well-structured large datasets in medical computer vision results in decreased performance of automated systems and, especially, of deep learning models. Domain generalization techniques aim to approach unknown domains from a single data source. In this paper we introduce a novel framework, named CompStyle, which leverages style transfer and adversarial training, along with high-level input complexity augmentation to effectively expand the domain space and address unknown distributions. State-of-the-art style transfer methods depend on the existence of subdomains within the source dataset. However, this can lead to an inherent dataset bias in the image creation. Input-level augmentation can provide a solution to this problem by widening the domain space in the source dataset and boost performance on out-of-domain distributions. We provide results from experiments on semantic segmentation on prostate data and corruption robustness on cardiac data which demonstrate the effectiveness of our approach. Our method increases performance in both tasks, without added cost to training time or resources.
CVOct 14, 2021
DeepMoCap: Deep Optical Motion Capture Using Multiple Depth Sensors and Retro-ReflectorsAnargyros Chatzitofis, Dimitrios Zarpalas, Stefanos Kollias et al.
In this paper, a marker-based, single-person optical motion capture method (DeepMoCap) is proposed using multiple spatio-temporally aligned infrared-depth sensors and retro-reflective straps and patches (reflectors). DeepMoCap explores motion capture by automatically localizing and labeling reflectors on depth images and, subsequently, on 3D space. Introducing a non-parametric representation to encode the temporal correlation among pairs of colorized depthmaps and 3D optical flow frames, a multi-stage Fully Convolutional Network (FCN) architecture is proposed to jointly learn reflector locations and their temporal dependency among sequential frames. The extracted reflector 2D locations are spatially mapped in 3D space, resulting in robust 3D optical data extraction. The subject's motion is efficiently captured by applying a template-based fitting technique on the extracted optical data. Two datasets have been created and made publicly available for evaluation purposes; one comprising multi-view depth and 3D optical flow annotated images (DMC2.5D), and a second, consisting of spatio-temporally aligned multi-view depth images along with skeleton, inertial and ground truth MoCap data (DMC3D). The FCN model outperforms its competitors on the DMC2.5D dataset using 2D Percentage of Correct Keypoints (PCK) metric, while the motion capture outcome is evaluated against RGB-D and inertial data fusion approaches on DMC3D, outperforming the next best method by 4.5% in total 3D PCK accuracy.
IVJun 14, 2021
MIA-COV19D: COVID-19 Detection through 3-D Chest CT Image AnalysisDimitrios Kollias, Anastasios Arsenos, Levon Soukissian et al.
Early and reliable COVID-19 diagnosis based on chest 3-D CT scans can assist medical specialists in vital circumstances. Deep learning methodologies constitute a main approach for chest CT scan analysis and disease prediction. However, large annotated databases are necessary for developing deep learning models that are able to provide COVID-19 diagnosis across various medical environments in different countries. Due to privacy issues, publicly available COVID-19 CT datasets are highly difficult to obtain, which hinders the research and development of AI-enabled diagnosis methods of COVID-19 based on CT scans. In this paper we present the COV19-CT-DB database which is annotated for COVID-19, consisting of about 5,000 3-D CT scans, We have split the database in training, validation and test datasets. The former two datasets can be used for training and validation of machine learning models, while the latter will be used for evaluation of the developed models. We also present a deep learning approach, based on a CNN-RNN network and report its performance on the COVID19-CT-DB database.
LGMay 1, 2021
AI-enabled Efficient and Safe Food Supply ChainIlianna Kollia, Jack Stevenson, Stefanos Kollias
This paper provides a review of an emerging field in the food processing sector, referring to efficient and safe food supply chains, from farm to fork, as enabled by Artificial Intelligence (AI). Recent advances in machine and deep learning are used for effective food production, energy management and food labeling. Appropriate deep neural architectures are adopted and used for this purpose, including Fully Convolutional Networks, Long Short-Term Memories and Recurrent Neural Networks, Auto-Encoders and Attention mechanisms, Latent Variable extraction and clustering, as well as Domain Adaptation. Three experimental studies are presented, illustrating the ability of these AI methodologies to produce state-of-the-art performance in the whole food supply chain. In particular, these concern: (i) predicting plant growth and tomato yield in greenhouses, thus matching food production to market needs and reducing food waste or food unavailability; (ii) optimizing energy consumption across large networks of food retail refrigeration systems, through optimal selection of systems that can get shut-down and through prediction of the respective food de-freezing times, during peaks of power demand load; (iii) optical recognition and verification of food consumption expiry date in automatic inspection of retail packaged food, thus ensuring safety of food and people's health.
LGDec 7, 2020
An autoencoder wavelet based deep neural network with attention mechanism for multistep prediction of plant growthBashar Alhnaity, Stefanos Kollias, Georgios Leontidis et al.
Multi-step prediction is considered of major significance for time series analysis in many real life problems. Existing methods mainly focus on one-step-ahead forecasting, since multiple step forecasting generally fails due to accumulation of prediction errors. This paper presents a novel approach for predicting plant growth in agriculture, focusing on prediction of plant Stem Diameter Variations (SDV). The proposed approach consists of three main steps. At first, wavelet decomposition is applied to the original data, as to facilitate model fitting and reduce noise in them. Then an encoder-decoder framework is developed using Long Short Term Memory (LSTM) and used for appropriate feature extraction from the data. Finally, a recurrent neural network including LSTM and an attention mechanism is proposed for modelling long-term dependencies in the time series data. Experimental results are presented which illustrate the good performance of the proposed approach and that it significantly outperforms the existing models, in terms of error criteria such as RMSE, MAE and MAPE.
CVDec 1, 2020
A compact sequence encoding scheme for online human activity recognition in HRI applicationsGeorgios Tsatiris, Kostas Karpouzis, Stefanos Kollias
Human activity recognition and analysis has always been one of the most active areas of pattern recognition and machine intelligence, with applications in various fields, including but not limited to exertion games, surveillance, sports analytics and healthcare. Especially in Human-Robot Interaction, human activity understanding plays a crucial role as household robotic assistants are a trend of the near future. However, state-of-the-art infrastructures that can support complex machine intelligence tasks are not always available, and may not be for the average consumer, as robotic hardware is expensive. In this paper we propose a novel action sequence encoding scheme which efficiently transforms spatio-temporal action sequences into compact representations, using Mahalanobis distance-based shape features and the Radon transform. This representation can be used as input for a lightweight convolutional neural network. Experiments show that the proposed pipeline, when based on state-of-the-art human pose estimation techniques, can provide a robust end-to-end online action recognition scheme, deployable on hardware lacking extreme computing capabilities.
LGJan 28, 2020
Multi-Source Deep Domain Adaptation for Quality Control in Retail Food PackagingMamatha Thota, Stefanos Kollias, Mark Swainson et al.
Retail food packaging contains information which informs choice and can be vital to consumer health, including product name, ingredients list, nutritional information, allergens, preparation guidelines, pack weight, storage and shelf life information (use-by / best before dates). The presence and accuracy of such information is critical to ensure a detailed understanding of the product and to reduce the potential for health risks. Consequently, erroneous or illegible labeling has the potential to be highly detrimental to consumers and many other stakeholders in the supply chain. In this paper, a multi-source deep learning-based domain adaptation system is proposed and tested to identify and verify the presence and legibility of use-by date information from food packaging photos taken as part of the validation process as the products pass along the food production line. This was achieved by improving the generalization of the techniques via making use of multi-source datasets in order to extract domain-invariant representations for all domains and aligning distribution of all pairs of source and target domains in a common feature space, along with the class boundaries. The proposed system performed very well in the conducted experiments, for automating the verification process and reducing labeling errors that could otherwise threaten public health and contravene legal requirements for food packaging information and accuracy. Comprehensive experiments on our food packaging datasets demonstrate that the proposed multi-source deep domain adaptation method significantly improves the classification accuracy and therefore has great potential for application and beneficial impact in food manufacturing control systems.
LGNov 25, 2019
A Unified Deep Learning Approach for Prediction of Parkinson's DiseaseJames Wingate, Ilianna Kollia, Luc Bidaut et al.
The paper presents a novel approach, based on deep learning, for diagnosis of Parkinson's disease through medical imaging. The approach includes analysis and use of the knowledge extracted by Deep Convolutional and Recurrent Neural Networks (DNNs) when trained with medical images, such as Magnetic Resonance Images and DaTscans. Internal representations of the trained DNNs constitute the extracted knowledge which is used in a transfer learning and domain adaptation manner, so as to create a unified framework for prediction of Parkinson's across different medical environments. A large experimental study is presented illustrating the ability of the proposed approach to effectively predict Parkinson's, using different medical image sets from real environments.
LGJul 1, 2019
Using Deep Learning to Predict Plant Growth and Yield in Greenhouse EnvironmentsBashar Alhnaity, Simon Pearson, Georgios Leontidis et al.
Effective plant growth and yield prediction is an essential task for greenhouse growers and for agriculture in general. Developing models which can effectively model growth and yield can help growers improve the environmental control for better production, match supply and market demand and lower costs. Recent developments in Machine Learning (ML) and, in particular, Deep Learning (DL) can provide powerful new analytical tools. The proposed study utilises ML and DL techniques to predict yield and plant growth variation across two different scenarios, tomato yield forecasting and Ficus benjamina stem growth, in controlled greenhouse environments. We deploy a new deep recurrent neural network (RNN), using the Long Short-Term Memory (LSTM) neuron model, in the prediction formulations. Both the former yield, growth and stem diameter values, as well as the microclimate conditions, are used by the RNN architecture to model the targeted growth parameters. A comparative study is presented, using ML methods, such as support vector regression and random forest regression, utilising the mean square error criterion, in order to evaluate the performance achieved by the different methods. Very promising results, based on data that have been obtained from two greenhouses, in Belgium and the UK, in the framework of the EU Interreg SMARTGREEN project (2017-2021), are presented.
LGMay 27, 2019
Capsule Routing via Variational BayesFabio De Sousa Ribeiro, Georgios Leontidis, Stefanos Kollias
Capsule networks are a recently proposed type of neural network shown to outperform alternatives in challenging shape recognition tasks. In capsule networks, scalar neurons are replaced with capsule vectors or matrices, whose entries represent different properties of objects. The relationships between objects and their parts are learned via trainable viewpoint-invariant transformation matrices, and the presence of a given object is decided by the level of agreement among votes from its parts. This interaction occurs between capsule layers and is a process called routing-by-agreement. In this paper, we propose a new capsule routing algorithm derived from Variational Bayes for fitting a mixture of transforming gaussians, and show it is possible transform our capsule network into a Capsule-VAE. Our Bayesian approach addresses some of the inherent weaknesses of MLE based models such as the variance-collapse by modelling uncertainty over capsule pose parameters. We outperform the state-of-the-art on smallNORB using 50% fewer capsules than previously reported, achieve competitive performances on CIFAR-10, Fashion-MNIST, SVHN, and demonstrate significant improvement in MNIST to affNIST generalisation over previous works.
NEMar 6, 2019
A Scalable Test Suite for Continuous Dynamic Multiobjective OptimisationShouyong Jiang, Marcus Kaiser, Shengxiang Yang et al.
Dynamic multiobjective optimisation has gained increasing attention in recent years. Test problems are of great importance in order to facilitate the development of advanced algorithms that can handle dynamic environments well. However, many of existing dynamic multiobjective test problems have not been rigorously constructed and analysed, which may induce some unexpected bias when they are used for algorithmic analysis. In this paper, some of these biases are identified after a review of widely used test problems. These include poor scalability of objectives and, more importantly, problematic overemphasis of static properties rather than dynamics making it difficult to draw accurate conclusion about the strengths and weaknesses of the algorithms studied. A diverse set of dynamics and features is then highlighted that a good test suite should have. We further develop a scalable continuous test suite, which includes a number of dynamics or features that have been rarely considered in literature but frequently occur in real life. It is demonstrated with empirical studies that the proposed test suite is more challenging to the dynamic multiobjective optimisation algorithms found in the literature. The test suite can also test algorithms in ways that existing test suites can not.
LGJan 23, 2019
Predicting Parkinson's Disease using Latent Information extracted from Deep Neural NetworksIlianna Kollia, Andreas-Georgios Stafylopatis, Stefanos Kollias
This paper presents a new method for medical diagnosis of neurodegenerative diseases, such as Parkinson's, by extracting and using latent information from trained Deep convolutional, or convolutional-recurrent Neural Networks (DNNs). In particular, our approach adopts a combination of transfer learning, k-means clustering and k-Nearest Neighbour classification of deep neural network learned representations to provide enriched prediction of the disease based on MRI and/or DaT Scan data. A new loss function is introduced and used in the training of the DNNs, so as to perform adaptation of the generated learned representations between data from different medical environments. Results are presented using a recently published database of Parkinson's related information, which was generated and evaluated in a hospital environment.
CVNov 26, 2018
Deep Bayesian Self-TrainingFabio De Sousa Ribeiro, Francesco Caliva, Mark Swainson et al.
Supervised Deep Learning has been highly successful in recent years, achieving state-of-the-art results in most tasks. However, with the ongoing uptake of such methods in industrial applications, the requirement for large amounts of annotated data is often a challenge. In most real world problems, manual annotation is practically intractable due to time/labour constraints, thus the development of automated and adaptive data annotation systems is highly sought after. In this paper, we propose both a (i) Deep Bayesian Self-Training methodology for automatic data annotation, by leveraging predictive uncertainty estimates using variational inference and modern Neural Network architectures, as well as (ii) a practical adaptation procedure for handling high label variability between different dataset distributions through clustering of Neural Network latent variable representations. An experimental study on both public and private datasets is presented illustrating the superior performance of the proposed approach over standard Self-Training baselines, highlighting the importance of predictive uncertainty estimates in safety-critical domains.
LGJul 26, 2018
Towards a Deep Unified Framework for Nuclear Reactor Perturbation AnalysisFabio De Sousa Ribeiro, Francesco Caliva, Dionysios Chionis et al.
In this paper, we take the first steps towards a novel unified framework for the analysis of perturbations in both the Time and Frequency domains. The identification of type and source of such perturbations is fundamental for monitoring reactor cores and guarantee safety while running at nominal conditions. A 3D Convolutional Neural Network (3D-CNN) was employed to analyse perturbations happening in the frequency domain, such as an absorber of variable strength or propagating perturbation. Recurrent neural networks (RNN), specifically Long Short-Term Memory (LSTM) networks were used to study signal sequences related to perturbations induced in the time domain, including the vibrations of fuel assemblies and the fluctuations of thermal-hydraulic parameters at the inlet of the reactor coolant loops. 512 dimensional representations were extracted from the 3D-CNN and LSTM architectures, and used as input to a fused multi-sigmoid classification layer to recognise the perturbation type. If the perturbation is in the frequency domain, a separate fully-connected layer utilises said representations to regress the coordinates of its source. The results showed that the perturbation type can be recognised with high accuracy in all cases, and frequency domain scenario sources can be localised with high precision.