LGJan 3, 2023Code
WLD-Reg: A Data-dependent Within-layer Diversity RegularizerFiras Laakom, Jenni Raitoharju, Alexandros Iosifidis et al.
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization, where the errors are back-propagated from the last layer back to the first one. At each optimization step, neurons at a given layer receive feedback from neurons belonging to higher layers of the hierarchy. In this paper, we propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer. To this end, we measure the pairwise similarity between the outputs of the neurons and use it to model the layer's overall diversity. We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks. The code is publically available at \url{https://github.com/firasl/AAAI-23-WLD-Reg}
CVNov 24, 2022
1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge ResultsBenjamin Kiefer, Matej Kristan, Janez Perš et al.
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
CVSep 22, 2022
Efficient CNN with uncorrelated Bag of Features poolingFiras Laakom, Jenni Raitoharju, Alexandros Iosifidis et al.
Despite the superior performance of CNN, deploying them on low computational power devices is still limited as they are typically computationally expensive. One key cause of the high complexity is the connection between the convolution layers and the fully connected layers, which typically requires a high number of parameters. To alleviate this issue, Bag of Features (BoF) pooling has been recently proposed. BoF learns a dictionary, that is used to compile a histogram representation of the input. In this paper, we propose an approach that builds on top of BoF pooling to boost its efficiency by ensuring that the items of the learned dictionary are non-redundant. We propose an additional loss term, based on the pair-wise correlation of the items of the dictionary, which complements the standard loss to explicitly regularize the model to learn a more diverse and rich dictionary. The proposed strategy yields an efficient variant of BoF and further boosts its performance, without any additional parameters.
LGJun 2, 2023
On Feature Diversity in Energy-based ModelsFiras Laakom, Jenni Raitoharju, Alexandros Iosifidis et al.
Energy-based learning is a powerful learning paradigm that encapsulates various discriminative and generative approaches. An energy-based model (EBM) is typically formed of inner-model(s) that learn a combination of the different features to generate an energy mapping for each input configuration. In this paper, we focus on the diversity of the produced feature set. We extend the probably approximately correct (PAC) theory of EBMs and analyze the effect of redundancy reduction on the performance of EBMs. We derive generalization bounds for various learning contexts, i.e., regression, classification, and implicit regression, with different energy functions and we show that indeed reducing redundancy of the feature set can consistently decrease the gap between the true and empirical expectation of the energy and boosts the performance of the model.
CVNov 10, 2022
Computer Vision on X-ray Data in Industrial Production and Security Applications: A Comprehensive SurveyMehdi Rafiei, Jenni Raitoharju, Alexandros Iosifidis
X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related image processing applications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.
LGSep 25, 2023
Convolutional autoencoder-based multimodal one-class classificationFiras Laakom, Fahad Sohrab, Jenni Raitoharju et al.
One-class classification refers to approaches of learning using data from a single class only. In this paper, we propose a deep learning one-class classification method suitable for multimodal data, which relies on two convolutional autoencoders jointly trained to reconstruct the positive input data while obtaining the data representations in the latent space as compact as possible. During inference, the distance of the latent representation of an input to the origin can be used as an anomaly score. Experimental results using a multimodal macroinvertebrate image classification dataset show that the proposed multimodal method yields better results as compared to the unimodal approach. Furthermore, study the effect of different input image sizes, and we investigate how recently proposed feature diversity regularizers affect the performance of our approach. We show that such regularizers improve performance.
LGMay 1, 2022
Generalized Reference Kernel for One-class ClassificationJenni Raitoharju, Alexandros Iosifidis
In this paper, we formulate a new generalized reference kernel hoping to improve the original base kernel using a set of reference vectors. Depending on the selected reference vectors, our formulation shows similarities to approximate kernels, random mappings, and Non-linear Projection Trick. Focusing on small-scale one-class classification, our analysis and experimental results show that the new formulation provides approaches to regularize, adjust the rank, and incorporate additional information into the kernel itself, leading to improved one-class classification accuracy.
LGAug 21, 2024Code
Linear-time One-Class Classification with Repeated Element-wise FoldingJenni Raitoharju
This paper proposes an easy-to-use method for one-class classification: Repeated Element-wise Folding (REF). The algorithm consists of repeatedly standardizing and applying an element-wise folding operation on the one-class training data. Equivalent mappings are performed on unknown test items and the classification prediction is based on the item's distance to the origin of the final distribution. As all the included operations have linear time complexity, the proposed algorithm provides a linear-time alternative for the commonly used computationally much more demanding approaches. Furthermore, REF can avoid the challenges of hyperparameter setting in one-class classification by providing robust default settings. The experiments show that the proposed method can produce similar classification performance or even outperform the more complex algorithms on various benchmark datasets. Matlab codes for REF are publicly available at https://github.com/JenniRaitoharju/REF.
CVApr 5, 2022
Automatic Image Content Extraction: Operationalizing Machine Learning in Humanistic Photographic Studies of Large Visual ArchivesAnssi Männistö, Mert Seker, Alexandros Iosifidis et al.
Applying machine learning tools to digitized image archives has a potential to revolutionize quantitative research of visual studies in humanities and social sciences. The ability to process a hundredfold greater number of photos than has been traditionally possible and to analyze them with an extensive set of variables will contribute to deeper insight into the material. Overall, these changes will help to shift the workflow from simple manual tasks to more demanding stages. In this paper, we introduce Automatic Image Content Extraction (AICE) framework for machine learning-based search and analysis of large image archives. We developed the framework in a multidisciplinary research project as framework for future photographic studies by reformulating and expanding the traditional visual content analysis methodologies to be compatible with the current and emerging state-of-the-art machine learning tools and to cover the novel machine learning opportunities for automatic content analysis. The proposed framework can be applied in several domains in humanities and social sciences, and it can be adjusted and scaled into various research settings. We also provide information on the current state of different machine learning techniques and show that there are already various publicly available methods that are suitable to a wide-scale of visual content analysis tasks.
CVSep 26, 2022
Habitat classification from satellite observations with sparse annotationsMikko Impiö, Pekka Härmä, Anna Tammilehto et al.
Remote sensing benefits habitat conservation by making monitoring of large areas easier compared to field surveying especially if the remote sensed data can be automatically analyzed. An important aspect of monitoring is classifying and mapping habitat types present in the monitored area. Automatic classification is a difficult task, as classes have fine-grained differences and their distributions are long-tailed and unbalanced. Usually training data used for automatic land cover classification relies on fully annotated segmentation maps, annotated from remote sensed imagery to a fairly high-level taxonomy, i.e., classes such as forest, farmland, or urban area. A challenge with automatic habitat classification is that reliable data annotation requires field-surveys. Therefore, full segmentation maps are expensive to produce, and training data is often sparse, point-like, and limited to areas accessible by foot. Methods for utilizing these limited data more efficiently are needed. We address these problems by proposing a method for habitat classification and mapping, and apply this method to classify the entire northern Finnish Lapland area into Natura2000 classes. The method is characterized by using finely-grained, sparse, single-pixel annotations collected from the field, combined with large amounts of unannotated data to produce segmentation maps. Supervised, unsupervised and semi-supervised methods are compared, and the benefits of transfer learning from a larger out-of-domain dataset are demonstrated. We propose a \ac{CNN} biased towards center pixel classification ensembled with a random forest classifier, that produces higher quality classifications than the models themselves alone. We show that cropping augmentations, test-time augmentation and semi-supervised learning can help classification even further.
CVJun 27, 2024Code
Improving Taxonomic Image-based Out-of-distribution Detection With DNA BarcodesMikko Impiö, Jenni Raitoharju
Image-based species identification could help scaling biodiversity monitoring to a global scale. Many challenges still need to be solved in order to implement these systems in real-world applications. A reliable image-based monitoring system must detect out-of-distribution (OOD) classes it has not been presented before. This is challenging especially with fine-grained classes. Emerging environmental monitoring techniques, DNA metabarcoding and eDNA, can help by providing information on OOD classes that are present in a sample. In this paper, we study if DNA barcodes can also support in finding the outlier images based on the outlier DNA sequence's similarity to the seen classes. We propose a re-ordering approach that can be easily applied on any pre-trained models and existing OOD detection methods. We experimentally show that the proposed approach improves taxonomic OOD detection compared to all common baselines. We also show that the method works thanks to a correlation between visual similarity and DNA barcode proximity. The code and data are available at https://github.com/mikkoim/dnaimg-ood.
CVDec 20, 2024Code
Efficient Curation of Invertebrate Image Datasets Using Feature Embeddings and Automatic Size ComparisonMikko Impiö, Philipp M. Rehsen, Jenni Raitoharju
The amount of image datasets collected for environmental monitoring purposes has increased in the past years as computer vision assisted methods have gained interest. Computer vision applications rely on high-quality datasets, making data curation important. However, data curation is often done ad-hoc and the methods used are rarely published. We present a method for curating large-scale image datasets of invertebrates that contain multiple images of the same taxa and/or specimens and have relatively uniform background in the images. Our approach is based on extracting feature embeddings with pretrained deep neural networks, and using these embeddings to find visually most distinct images by comparing their embeddings to the group prototype embedding. Also, we show that a simple area-based size comparison approach is able to find a lot of common erroneous images, such as images containing detached body parts and misclassified samples. In addition to the method, we propose using novel metrics for evaluating human-in-the-loop outlier detection methods. The implementations of the proposed curation methods, as well as a benchmark dataset containing annotated erroneous images, are publicly available in https://github.com/mikkoim/taxonomist-studio.
CVMar 6
Computer vision-based estimation of invertebrate biomassMikko Impiö, Philipp M. Rehsen, Jarrett Blair et al.
The ability to estimate invertebrate biomass using only images could help scaling up quantitative biodiversity monitoring efforts. Computer vision-based methods have the potential to omit the manual, time-consuming, and destructive process of dry weighing specimens. We present two approaches for dry mass estimation that do not require additional manual effort apart from imaging the specimens: fitting a linear model with novel predictors, automatically calculated by an imaging device, and training a family of end-to-end deep neural networks for the task, using single-view, multi-view, and metadata-aware architectures. We propose using area and sinking speed as predictors. These can be calculated with BIODISCOVER, which is a dual-camera system that captures image sequences of specimens sinking in an ethanol column. For this study, we collected a large dataset of dry mass measurement and image sequence pairs to train and evaluate models. We show that our methods can estimate specimen dry mass even with complex and visually diverse specimen morphologies. Combined with automatic taxonomic classification, our approach is an accurate method for group-level dry mass estimation, with a median percentage error of 10-20% for individuals. We highlight the importance of choosing appropriate evaluation metrics, and encourage using both percentage errors and absolute errors as metrics, because they measure different properties. We also explore different optimization losses, data augmentation methods, and model architectures for training deep-learning models.
LGJun 17, 2025
Generalized Reference Kernel With Negative Samples For Support Vector One-class ClassificationJenni Raitoharju
This paper focuses on small-scale one-class classification with some negative samples available. We propose Generalized Reference Kernel with Negative Samples (GRKneg) for One-class Support Vector Machine (OC-SVM). We study different ways to select/generate the reference vectors and recommend an approach for the problem at hand. It is worth noting that the proposed method does not use any labels in the model optimization but uses the original OC-SVM implementation. Only the kernel used in the process is improved using the negative data. We compare our method with the standard OC-SVM and with the binary Support Vector Machine (SVM) using different amounts of negative samples. Our approach consistently outperforms the standard OC-SVM using Radial Basis Function kernel. When there are plenty of negative samples, the binary SVM outperforms the one-class approaches as expected, but we show that for the lowest numbers of negative samples the proposed approach clearly outperforms the binary SVM.
CVMay 28, 2025
AquaMonitor: A multimodal multi-view image sequence dataset for real-life aquatic invertebrate biodiversity monitoringMikko Impiö, Philipp M. Rehsen, Tiina Laamanen et al.
This paper presents the AquaMonitor dataset, the first large computer vision dataset of aquatic invertebrates collected during routine environmental monitoring. While several large species identification datasets exist, they are rarely collected using standardized collection protocols, and none focus on aquatic invertebrates, which are particularly laborious to collect. For AquaMonitor, we imaged all specimens from two years of monitoring whenever imaging was possible given practical limitations. The dataset enables the evaluation of automated identification methods for real-life monitoring purposes using a realistically challenging and unbiased setup. The dataset has 2.7M images from 43,189 specimens, DNA sequences for 1358 specimens, and dry mass and size measurements for 1494 specimens, making it also one of the largest biological multi-view and multimodal datasets to date. We define three benchmark tasks and provide strong baselines for these: 1) Monitoring benchmark, reflecting real-life deployment challenges such as open-set recognition, distribution shift, and extreme class imbalance, 2) Classification benchmark, which follows a standard fine-grained visual categorization setup, and 3) Few-shot benchmark, which targets classes with only few training examples from very fine-grained categories. Advancements on the Monitoring benchmark can directly translate to improvement of aquatic biodiversity monitoring, which is an important component of regular legislative water quality assessment in many countries.
LGFeb 9, 2022
Non-Linear Spectral Dimensionality Reduction Under UncertaintyFiras Laakom, Jenni Raitoharju, Nikolaos Passalis et al.
In this paper, we consider the problem of non-linear dimensionality reduction under uncertainty, both from a theoretical and algorithmic perspectives. Since real-world data usually contain measurements with uncertainties and artifacts, the input space in the proposed framework consists of probability distributions to model the uncertainties associated with each sample. We propose a new dimensionality reduction framework, called NGEU, which leverages uncertainty information and directly extends several traditional approaches, e.g., KPCA, MDA/KMFA, to receive as inputs the probability distributions instead of the original data. We show that the proposed NGEU formulation exhibits a global closed-form solution, and we analyze, based on the Rademacher complexity, how the underlying uncertainties theoretically affect the generalization ability of the framework. Empirical results on different datasets show the effectiveness of the proposed framework.
CVFeb 9, 2022
Reducing Redundancy in the Bottleneck Representation of the AutoencodersFiras Laakom, Jenni Raitoharju, Alexandros Iosifidis et al.
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks, e.g., dimensionality reduction, image compression, and image denoising. An AE has two goals: (i) compress the original input to a low-dimensional space at the bottleneck of the network topology using an encoder, (ii) reconstruct the input from the representation at the bottleneck using a decoder. Both encoder and decoder are optimized jointly by minimizing a distortion-based loss which implicitly forces the model to keep only those variations of input data that are required to reconstruct the and to reduce redundancies. In this paper, we propose a scheme to explicitly penalize feature redundancies in the bottleneck representation. To this end, we propose an additional loss term, based on the pair-wise correlation of the neurons, which complements the standard reconstruction loss forcing the encoder to learn a more diverse and richer representation of the input. We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST. The experimental results show that the proposed loss leads consistently to superior performance compared to the standard AE loss.
CVNov 10, 2021
Learning to ignore: rethinking attention in CNNsFiras Laakom, Kateryna Chumachenko, Jenni Raitoharju et al.
Recently, there has been an increasing interest in applying attention mechanisms in Convolutional Neural Networks (CNNs) to solve computer vision tasks. Most of these methods learn to explicitly identify and highlight relevant parts of the scene and pass the attended image to further layers of the network. In this paper, we argue that such an approach might not be optimal. Arguably, explicitly learning which parts of the image are relevant is typically harder than learning which parts of the image are less relevant and, thus, should be ignored. In fact, in vision domain, there are many easy-to-identify patterns of irrelevant features. For example, image regions close to the borders are less likely to contain useful information for a classification task. Based on this idea, we propose to reformulate the attention mechanism in CNNs to learn to ignore instead of learning to attend. Specifically, we propose to explicitly learn irrelevant information in the scene and suppress it in the produced representation, keeping only important attributes. This implicit attention scheme can be incorporated into any existing attention mechanism. In this work, we validate this idea using two recent attention methods Squeeze and Excitation (SE) block and Convolutional Block Attention Module (CBAM). Experimental results on different datasets and model architectures show that learning to ignore, i.e., implicit attention, yields superior performance compared to the standard approaches.
CVJun 16, 2021
Automatic Main Character Recognition for Photographic StudiesMert Seker, Anssi Männistö, Alexandros Iosifidis et al.
Main characters in images are the most important humans that catch the viewer's attention upon first look, and they are emphasized by properties such as size, position, color saturation, and sharpness of focus. Identifying the main character in images plays an important role in traditional photographic studies and media analysis, but the task is performed manually and can be slow and laborious. Furthermore, selection of main characters can be sometimes subjective. In this paper, we analyze the feasibility of solving the main character recognition needed for photographic studies automatically and propose a method for identifying the main characters. The proposed method uses machine learning based human pose estimation along with traditional computer vision approaches for this task. We approach the task as a binary classification problem where each detected human is classified either as a main character or not. To evaluate both the subjectivity of the task and the performance of our method, we collected a dataset of 300 varying images from multiple sources and asked five people, a photographic researcher and four other persons, to annotate the main characters. Our analysis showed a relatively high agreement between different annotators. The proposed method achieved a promising F1 score of 0.83 on the full image set and 0.96 on a subset evaluated as most clear and important cases by the photographic researcher.
LGJun 10, 2021
Learning distinct features helps, provablyFiras Laakom, Jenni Raitoharju, Alexandros Iosifidis et al.
We study the diversity of the features learned by a two-layer neural network trained with the least squares loss. We measure the diversity by the average $L_2$-distance between the hidden-layer features and theoretically investigate how learning non-redundant distinct features affects the performance of the network. To do so, we derive novel generalization bounds depending on feature diversity based on Rademacher complexity for such networks. Our analysis proves that more distinct features at the network's units within the hidden layer lead to better generalization. We also show how to extend our results to deeper networks and different losses.
LGApr 29, 2021
Graph-Embedded Subspace Support Vector Data DescriptionFahad Sohrab, Alexandros Iosifidis, Moncef Gabbouj et al.
In this paper, we propose a novel subspace learning framework for one-class classification. The proposed framework presents the problem in the form of graph embedding. It includes the previously proposed subspace one-class techniques as its special cases and provides further insight on what these techniques actually optimize. The framework allows to incorporate other meaningful optimization goals via the graph preserving criterion and reveals a spectral solution and a spectral regression-based solution as alternatives to the previously used gradient-based technique. We combine the subspace learning framework iteratively with Support Vector Data Description applied in the subspace to formulate Graph-Embedded Subspace Support Vector Data Description. We experimentally analyzed the performance of newly proposed different variants. We demonstrate improved performance against the baselines and the recently proposed subspace learning methods for one-class classification.
CRApr 12, 2021
Multi-level reversible encryption for ECG signals using compressive sensingMikko Impiö, Mehmet Yamaç, Jenni Raitoharju
Privacy concerns in healthcare have gained interest recently via GDPR, with a rising need for privacy-preserving data collection methods that keep personal information hidden in otherwise usable data. Sometimes data needs to be encrypted for several authentication levels, where a semi-authorized user gains access to data stripped of personal or sensitive information, while a fully-authorized user can recover the full signal. In this paper, we propose a compressive sensing based multi-level encryption to ECG signals to mask possible heartbeat anomalies from semi-authorized users, while preserving the beat structure for heart rate monitoring. Masking is performed both in time and frequency domains. Masking effectiveness is validated using 1D convolutional neural networks for heartbeat anomaly classification, while masked signal usefulness is validated comparing heartbeat detection accuracy between masked and recovered signals. The proposed multi-level encryption method can decrease classification accuracy of heartbeat anomalies by up to 50%, while maintaining a fairly high R-peak detection accuracy.
CVMar 11, 2021
Automatic Social Distance Estimation From Images: Performance Evaluation, Test Benchmark, and AlgorithmMert Seker, Anssi Männistö, Alexandros Iosifidis et al.
The COVID-19 virus has caused a global pandemic since March 2020. The World Health Organization (WHO) has provided guidelines on how to reduce the spread of the virus and one of the most important measures is social distancing. Maintaining a minimum of one meter distance from other people is strongly suggested to reduce the risk of infection. This has created a strong interest in monitoring the social distances either as a safety measure or to study how the measures have affected human behavior and country-wise differences in this. The need for automatic social distance estimation algorithms is evident, but there is no suitable test benchmark for such algorithms. Collecting images with measured ground-truth pair-wise distances between all the people using different camera settings is cumbersome. Furthermore, performance evaluation for social distance estimation algorithms is not straightforward and there is no widely accepted evaluation protocol. In this paper, we provide a dataset of varying images with measured pair-wise social distances under different camera positionings and focal length values. We suggest a performance evaluation protocol and provide a benchmark to easily evaluate social distance estimation algorithms. We also propose a method for automatic social distance estimation. Our method takes advantage of object detection and human pose estimation. It can be applied on any single image as long as focal length and sensor size information are known. The results on our benchmark are encouraging with 92% human detection rate and only 28.9% average error in distance estimation among the detected people.
CVFeb 9, 2021
Ensembling object detectors for image and video data analysisKateryna Chumachenko, Jenni Raitoharju, Alexandros Iosifidis et al.
In this paper, we propose a method for ensembling the outputs of multiple object detectors for improving detection performance and precision of bounding boxes on image data. We further extend it to video data by proposing a two-stage tracking-based scheme for detection refinement. The proposed method can be used as a standalone approach for improving object detection performance, or as a part of a framework for faster bounding box annotation in unseen datasets, assuming that the objects of interest are those present in some common public datasets.
SDNov 23, 2020
Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-organized Operational LayerMohammad Soltanian, Junaid Malik, Jenni Raitoharju et al.
Automatic classification of speech commands has revolutionized human computer interactions in robotic applications. However, employed recognition models usually follow the methodology of deep learning with complicated networks which are memory and energy hungry. So, there is a need to either squeeze these complicated models or use more efficient light-weight models in order to be able to implement the resulting classifiers on embedded devices. In this paper, we pick the second approach and propose a network layer to enhance the speech command recognition capability of a lightweight network and demonstrate the result via experiments. The employed method borrows the ideas of Taylor expansion and quadratic forms to construct a better representation of features in both input and hidden layers. This richer representation results in recognition accuracy improvement as shown by extensive experiments on Google speech commands (GSC) and synthetic speech commands (SSC) datasets.
LGSep 1, 2020
Graph Embedding with Data UncertaintyFiras Laakom, Jenni Raitoharju, Nikolaos Passalis et al.
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines. The main aim is to learn a meaningful low dimensional embedding of the data. However, most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty. Thus, learning directly from raw data can be misleading and can negatively impact the accuracy. In this paper, we propose to model artifacts in training data using probability distributions; each data point is represented by a Gaussian distribution centered at the original data point and having a variance modeling its uncertainty. We reformulate the Graph Embedding framework to make it suitable for learning from distributions and we study as special cases the Linear Discriminant Analysis and the Marginal Fisher Analysis techniques. Furthermore, we propose two schemes for modeling data uncertainty based on pair-wise distances in an unsupervised and a supervised contexts.
ROAug 28, 2020
Collaborative Multi-Robot Systems for Search and Rescue: Coordination and PerceptionJorge Peña Queralta, Jussi Taipalmaa, Bilge Can Pullinen et al.
Autonomous or teleoperated robots have been playing increasingly important roles in civil applications in recent years. Across the different civil domains where robots can support human operators, one of the areas where they can have more impact is in search and rescue (SAR) operations. In particular, multi-robot systems have the potential to significantly improve the efficiency of SAR personnel with faster search of victims, initial assessment and mapping of the environment, real-time monitoring and surveillance of SAR operations, or establishing emergency communication networks, among other possibilities. SAR operations encompass a wide variety of environments and situations, and therefore heterogeneous and collaborative multi-robot systems can provide the most advantages. In this paper, we review and analyze the existing approaches to multi-robot SAR support, from an algorithmic perspective and putting an emphasis on the methods enabling collaboration among the robots as well as advanced perception through machine vision and multi-agent active perception. Furthermore, we put these algorithms in the context of the different challenges and constraints that various types of robots (ground, aerial, surface or underwater) encounter in different SAR environments (maritime, urban, wilderness or other post-disaster scenarios). This is, to the best of our knowledge, the first review considering heterogeneous SAR robots across different environments, while giving two complimentary points of view: control mechanisms and machine perception. Based on our review of the state-of-the-art, we discuss the main open research questions, and outline our insights on the current approaches that have potential to improve the real-world performance of multi-robot SAR systems.
CVJul 20, 2020
Monte Carlo Dropout Ensembles for Robust Illumination EstimationFiras Laakom, Jenni Raitoharju, Alexandros Iosifidis et al.
Computational color constancy is a preprocessing step used in many camera systems. The main aim is to discount the effect of the illumination on the colors in the scene and restore the original colors of the objects. Recently, several deep learning-based approaches have been proposed to solve this problem and they often led to state-of-the-art performance in terms of average errors. However, for extreme samples, these methods fail and lead to high errors. In this paper, we address this limitation by proposing to aggregate different deep learning methods according to their output uncertainty. We estimate the relative uncertainty of each approach using Monte Carlo dropout and the final illumination estimate is obtained as the sum of the different model estimates weighted by the log-inverse of their corresponding uncertainties. The proposed framework leads to state-of-the-art performance on INTEL-TAU dataset.
ROMay 7, 2020
AutoSOS: Towards Multi-UAV Systems Supporting Maritime Search and Rescue with Lightweight AI and Edge ComputingJorge Peña Queralta, Jenni Raitoharju, Tuan Nguyen Gia et al.
Rescue vessels are the main actors in maritime safety and rescue operations. At the same time, aerial drones bring a significant advantage into this scenario. This paper presents the research directions of the AutoSOS project, where we work in the development of an autonomous multi-robot search and rescue assistance platform capable of sensor fusion and object detection in embedded devices using novel lightweight AI models. The platform is meant to perform reconnaissance missions for initial assessment of the environment using novel adaptive deep learning algorithms that efficiently use the available sensors and computational resources on drones and rescue vessel. When drones find potential objects, they will send their sensor data to the vessel to verity the findings with increased accuracy. The actual rescue and treatment operation are left as the responsibility of the rescue personnel. The drones will autonomously reconfigure their spatial distribution to enable multi-hop communication, when a direct connection between a drone transmitting information and the vessel is unavailable.
CVMay 6, 2020
Probabilistic Color ConstancyFiras Laakom, Jenni Raitoharju, Alexandros Iosifidis et al.
In this paper, we propose a novel unsupervised color constancy method, called Probabilistic Color Constancy (PCC). We define a framework for estimating the illumination of a scene by weighting the contribution of different image regions using a graph-based representation of the image. To estimate the weight of each (super-)pixel, we rely on two assumptions: (Super-)pixels with similar colors contribute similarly and darker (super-)pixels contribute less. The resulting system has one global optimum solution. The proposed method achieves competitive performance, compared to the state-of-the-art, on INTEL-TAU dataset.
LGApr 8, 2020
Saliency-based Weighted Multi-label Linear Discriminant AnalysisLei Xu, Jenni Raitoharju, Alexandros Iosifidis et al.
In this paper, we propose a new variant of Linear Discriminant Analysis (LDA) to solve multi-label classification tasks. The proposed method is based on a probabilistic model for defining the weights of individual samples in a weighted multi-label LDA approach. Linear Discriminant Analysis is a classical statistical machine learning method, which aims to find a linear data transformation increasing class discrimination in an optimal discriminant subspace. Traditional LDA sets assumptions related to Gaussian class distributions and single-label data annotations. To employ the LDA technique in multi-label classification problems, we exploit intuitions coming from a probabilistic interpretation of class saliency to redefine the between-class and within-class scatter matrices. The saliency-based weights obtained based on various kinds of affinity encoding prior information are used to reveal the probability of each instance to be salient for each of its classes in the multi-label problem at hand. The proposed Saliency-based weighted Multi-label LDA approach is shown to lead to performance improvements in various multi-label classification problems.
LGMar 25, 2020
Not all domains are equally complex: Adaptive Multi-Domain LearningAli Senhaji, Jenni Raitoharju, Moncef Gabbouj et al.
Deep learning approaches are highly specialized and require training separate models for different tasks. Multi-domain learning looks at ways to learn a multitude of different tasks, each coming from a different domain, at once. The most common approach in multi-domain learning is to form a domain agnostic model, the parameters of which are shared among all domains, and learn a small number of extra domain-specific parameters for each individual new domain. However, different domains come with different levels of difficulty; parameterizing the models of all domains using an augmented version of the domain agnostic model leads to unnecessarily inefficient solutions, especially for easy to solve tasks. We propose an adaptive parameterization approach to deep neural networks for multi-domain learning. The proposed approach performs on par with the original approach while reducing by far the number of parameters, leading to efficient multi-domain learning solutions.
LGMar 20, 2020
Ellipsoidal Subspace Support Vector Data DescriptionFahad Sohrab, Jenni Raitoharju, Alexandros Iosifidis et al.
In this paper, we propose a novel method for transforming data into a low-dimensional space optimized for one-class classification. The proposed method iteratively transforms data into a new subspace optimized for ellipsoidal encapsulation of target class data. We provide both linear and non-linear formulations for the proposed method. The method takes into account the covariance of the data in the subspace; hence, it yields a more generalized solution as compared to Subspace Support Vector Data Description for a hypersphere. We propose different regularization terms expressing the class variance in the projected space. We compare the results with classic and recently proposed one-class classification methods and achieve better results in the majority of cases. The proposed method is also noticed to converge much faster than recently proposed Subspace Support Vector Data Description.
CVFeb 12, 2020
Boosting rare benthic macroinvertebrates taxa identification with one-class classificationFahad Sohrab, Jenni Raitoharju
Insect monitoring is crucial for understanding the consequences of rapid ecological changes, but taxa identification currently requires tedious manual expert work and cannot be scaled-up efficiently. Deep convolutional neural networks (CNNs), provide a viable way to significantly increase the biomonitoring volumes. However, taxa abundances are typically very imbalanced and the amounts of training images for the rarest classes are simply too low for deep CNNs. As a result, the samples from the rare classes are often completely missed, while detecting them has biological importance. In this paper, we propose combining the trained deep CNN with one-class classifiers to improve the rare species identification. One-class classification models are traditionally trained with much fewer samples and they can provide a mechanism to indicate samples potentially belonging to the rare classes for human inspection. Our experiments confirm that the proposed approach may indeed support moving towards partial automation of the taxa identification task.
LGFeb 11, 2020
Incremental Fast Subclass Discriminant AnalysisKateryna Chumachenko, Jenni Raitoharju, Moncef Gabbouj et al.
This paper proposes an incremental solution to Fast Subclass Discriminant Analysis (fastSDA). We present an exact and an approximate linear solution, along with an approximate kernelized variant. Extensive experiments on eight image datasets with different incremental batch sizes show the superiority of the proposed approach in terms of training time and accuracy being equal or close to fastSDA solution and outperforming other methods.
CVFeb 5, 2020
Automatic image-based identification and biomass estimation of invertebratesJohanna Ärje, Claus Melvad, Mads Rosenhøj Jeppesen et al.
Understanding how biological communities respond to environmental changes is a key challenge in ecology and ecosystem management. The apparent decline of insect populations necessitates more biomonitoring but the time-consuming sorting and identification of taxa pose strong limitations on how many insect samples can be processed. In turn, this affects the scale of efforts to map invertebrate diversity altogether. Given recent advances in computer vision, we propose to replace the standard manual approach of human expert-based sorting and identification with an automatic image-based technology. We describe a robot-enabled image-based identification machine, which can automate the process of invertebrate identification, biomass estimation and sample sorting. We use the imaging device to generate a comprehensive image database of terrestrial arthropod species. We use this database to test the classification accuracy i.e. how well the species identity of a specimen can be predicted from images taken by the machine. We also test sensitivity of the classification accuracy to the camera settings (aperture and exposure time) in order to move forward with the best possible image quality. We use state-of-the-art Resnet-50 and InceptionV3 CNNs for the classification task. The results for the initial dataset are very promising ($\overline{ACC}=0.980$). The system is general and can easily be used for other groups of invertebrates as well. As such, our results pave the way for generating more data on spatial and temporal variation in invertebrate abundance, diversity and biomass.
IVOct 23, 2019
INTEL-TAU: A Color Constancy DatasetFiras Laakom, Jenni Raitoharju, Alexandros Iosifidis et al.
In this paper, we describe a new large dataset for illumination estimation. This dataset, called INTEL-TAU, contains 7022 images in total, which makes it the largest available high-resolution dataset for illumination estimation research. The variety of scenes captured using three different camera models, namely Canon 5DSR, Nikon D810, and Sony IMX135, makes the dataset appropriate for evaluating the camera and scene invariance of the different illumination estimation techniques. Privacy masking is done for sensitive information, e.g., faces. Thus, the dataset is coherent with the new General Data Protection Regulation (GDPR). Furthermore, the effect of color shading for mobile images can be evaluated with INTEL-TAU dataset, as both corrected and uncorrected versions of the raw data are provided. Furthermore, this paper benchmarks several color constancy approaches on the proposed dataset.
NEAug 19, 2019
Neural Architecture Search by Estimation of Network Structure DistributionsAnton Muravev, Jenni Raitoharju, Moncef Gabbouj
The influence of deep learning is continuously expanding across different domains, and its new applications are ubiquitous. The question of neural network design thus increases in importance, as traditional empirical approaches are reaching their limits. Manual design of network architectures from scratch relies heavily on trial and error, while using existing pretrained models can introduce redundancies or vulnerabilities. Automated neural architecture design is able to overcome these problems, but the most successful algorithms operate on significantly constrained design spaces, assuming the target network to consist of identical repeating blocks. While such approach allows for faster search, it does so at the cost of expressivity. We instead propose an alternative probabilistic representation of a whole neural network structure under the assumption of independence between layer types. Our matrix of probabilities is equivalent to the population of models, but allows for discovery of structural irregularities, while being simple to interpret and analyze. We construct an architecture search algorithm, inspired by the estimation of distribution algorithms, to take advantage of this representation. The probability matrix is tuned towards generating high-performance models by repeatedly sampling the architectures and evaluating the corresponding networks, while gradually increasing the model depth. Our algorithm is shown to discover non-regular models which cannot be expressed via blocks, but are competitive both in accuracy and computational cost, while not utilizing complex dataflows or advanced training techniques, as well as remaining conceptually simple and highly extensible.
LGAug 13, 2019
Null Space Analysis for Class-Specific Discriminant LearningJenni Raitoharju, Alexandros Iosifidis
In this paper, we carry out null space analysis for Class-Specific Discriminant Analysis (CSDA) and formulate a number of solutions based on the analysis. We analyze both theoretically and experimentally the significance of each algorithmic step. The innate subspace dimensionality resulting from the proposed solutions is typically quite high and we discuss how the need for further dimensionality reduction changes the situation. Experimental evaluation of the proposed solutions shows that the straightforward extension of null space analysis approaches to the class-specific setting can outperform the standard CSDA method. Furthermore, by exploiting a recently proposed out-of-class scatter definition encoding the multi-modality of the negative class naturally appearing in class-specific problems, null space projections can lead to a performance comparable to or outperforming the most recent CSDA methods.
CRJun 20, 2019
Reversible Privacy Preservation using Multi-level Encryption and Compressive SensingMehmet Yamac, Mete Ahishali, Nikolaos Passalis et al.
Security monitoring via ubiquitous cameras and their more extended in intelligent buildings stand to gain from advances in signal processing and machine learning. While these innovative and ground-breaking applications can be considered as a boon, at the same time they raise significant privacy concerns. In fact, recent GDPR (General Data Protection Regulation) legislation has highlighted and become an incentive for privacy-preserving solutions. Typical privacy-preserving video monitoring schemes address these concerns by either anonymizing the sensitive data. However, these approaches suffer from some limitations, since they are usually non-reversible, do not provide multiple levels of decryption and computationally costly. In this paper, we provide a novel privacy-preserving method, which is reversible, supports de-identification at multiple privacy levels, and can efficiently perform data acquisition, encryption and data hiding by combining multi-level encryption with compressive sensing. The effectiveness of the proposed approach in protecting the identity of the users has been validated using the goodness of reconstruction quality and strong anonymization of the faces.
CVJun 11, 2019
Bag of Color Features For Color ConstancyFiras Laakom, Nikolaos Passalis, Jenni Raitoharju et al.
In this paper, we propose a novel color constancy approach, called Bag of Color Features (BoCF), building upon Bag-of-Features pooling. The proposed method substantially reduces the number of parameters needed for illumination estimation. At the same time, the proposed method is consistent with the color constancy assumption stating that global spatial information is not relevant for illumination estimation and local information ( edges, etc.) is sufficient. Furthermore, BoCF is consistent with color constancy statistical approaches and can be interpreted as a learning-based generalization of many statistical approaches. To further improve the illumination estimation accuracy, we propose a novel attention mechanism for the BoCF model with two variants based on self-attention. BoCF approach and its variants achieve competitive, compared to the state of the art, results while requiring much fewer parameters on three benchmark datasets: ColorChecker RECommended, INTEL-TUT version 2, and NUS8.
CVJun 4, 2019
Color Constancy Convolutional AutoencoderFiras Laakom, Jenni Raitoharju, Alexandros Iosifidis et al.
In this paper, we study the importance of pre-training for the generalization capability in the color constancy problem. We propose two novel approaches based on convolutional autoencoders: an unsupervised pre-training algorithm using a fine-tuned encoder and a semi-supervised pre-training algorithm using a novel composite-loss function. This enables us to solve the data scarcity problem and achieve competitive, to the state-of-the-art, results while requiring much fewer parameters on ColorChecker RECommended dataset. We further study the over-fitting phenomenon on the recently introduced version of INTEL-TUT Dataset for Camera Invariant Color Constancy Research, which has both field and non-field scenes acquired by three different camera models.
LGMay 2, 2019
Speed-up and multi-view extensions to Subclass Discriminant AnalysisKateryna Chumachenko, Jenni Raitoharju, Alexandros Iosifidis et al.
In this paper, we propose a speed-up approach for subclass discriminant analysis and formulate a novel efficient multi-view solution to it. The speed-up approach is developed based on graph embedding and spectral regression approaches that involve eigendecomposition of the corresponding Laplacian matrix and regression to its eigenvectors. We show that by exploiting the structure of the between-class Laplacian matrix, the eigendecomposition step can be substituted with a much faster process. Furthermore, we formulate a novel criterion for multi-view subclass discriminant analysis and show that an efficient solution for it can be obtained in a similar to the single-view manner. We evaluate the proposed methods on nine single-view and nine multi-view datasets and compare them with related existing approaches. Experimental results show that the proposed solutions achieve competitive performance, often outperforming the existing methods. At the same time, they significantly decrease the training time.
CVApr 22, 2019
Machine Learning Based Analysis of Finnish World War II PhotographersKateryna Chumachenko, Anssi Männistö, Alexandros Iosifidis et al.
In this paper, we demonstrate the benefits of using state-of-the-art machine learning methods in the analysis of historical photo archives. Specifically, we analyze prominent Finnish World War II photographers, who have captured high numbers of photographs in the publicly available Finnish Wartime Photograph Archive, which contains 160,000 photographs from Finnish Winter, Continuation, and Lapland Wars captures in 1939-1945. We were able to find some special characteristics for different photographers in terms of their typical photo content and framing (e.g., close-ups vs. overall shots, number of people). Furthermore, we managed to train a neural network that can successfully recognize the photographer from some of the photos, which shows that such photos are indeed characteristic for certain photographers. We further analyzed the similarities and differences between the photographers using the features extracted from the photographer classifier network. We make our annotations and analysis pipeline publicly available, in an effort to introduce this new research problem to the machine learning and computer vision communities and facilitate future research in historical and societal studies over the photo archives.
LGApr 16, 2019
Multimodal Subspace Support Vector Data DescriptionFahad Sohrab, Jenni Raitoharju, Alexandros Iosifidis et al.
In this paper, we propose a novel method for projecting data from multiple modalities to a new subspace optimized for one-class classification. The proposed method iteratively transforms the data from the original feature space of each modality to a new common feature space along with finding a joint compact description of data coming from all the modalities. For data in each modality, we define a separate transformation to map the data from the corresponding feature space to the new optimized subspace by exploiting the available information from the class of interest only. We also propose different regularization strategies for the proposed method and provide both linear and non-linear formulations. The proposed Multimodal Subspace Support Vector Data Description outperforms all the competing methods using data from a single modality or fusing data from all modalities in four out of five datasets.
CVFeb 12, 2018
Subspace Support Vector Data DescriptionFahad Sohrab, Jenni Raitoharju, Moncef Gabbouj et al.
This paper proposes a novel method for solving one-class classification problems. The proposed approach, namely Subspace Support Vector Data Description, maps the data to a subspace that is optimized for one-class classification. In that feature space, the optimal hypersphere enclosing the target class is then determined. The method iteratively optimizes the data mapping along with data description in order to define a compact class representation in a low-dimensional feature space. We provide both linear and non-linear mappings for the proposed method. Experiments on 14 publicly available datasets indicate that the proposed Subspace Support Vector Data Description provides better performance compared to baselines and other recently proposed one-class classification methods.
MLAug 23, 2017
Human experts vs. machines in taxa recognitionJohanna Ärje, Jenni Raitoharju, Alexandros Iosifidis et al.
The step of expert taxa recognition currently slows down the response time of many bioassessments. Shifting to quicker and cheaper state-of-the-art machine learning approaches is still met with expert scepticism towards the ability and logic of machines. In our study, we investigate both the differences in accuracy and in the identification logic of taxonomic experts and machines. We propose a systematic approach utilizing deep Convolutional Neural Nets with the transfer learning paradigm and extensively evaluate it over a multi-pose taxonomic dataset with hierarchical labels specifically created for this comparison. We also study the prediction accuracy on different ranks of taxonomic hierarchy in detail. We used support vector machine classifier as a benchmark. Our results revealed that human experts using actual specimens yield the lowest classification error ($\overline{CE}=6.1\%$). However, a much faster, automated approach using deep Convolutional Neural Nets comes close to human accuracy ($\overline{CE}=11.4\%$) when a typical flat classification approach is used. Contrary to previous findings in the literature, we find that for machines following a typical flat classification approach commonly used in machine learning performs better than forcing machines to adopt a hierarchical, local per parent node approach used by human taxonomic experts ($\overline{CE}=13.8\%$). Finally, we publicly share our unique dataset to serve as a public benchmark dataset in this field.