Alexandre X. Falcão

h-index49

31papers

648citations

Novelty48%

AI Score42

Ranked #60,146 of 194,257 authors (top 31%)#20,726 in CV (top 35%)

31 Papers

29.6LGApr 22, 2022Code

Federated Learning Enables Big Data for Rare Cancer Boundary Detection

Sarthak Pati, Ujjwal Baid, Brandon Edwards et al.

Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.

11.3CVSep 27, 2024Code

A comprehensive review and new taxonomy on superpixel segmentation

I. B. Barcelos, F. de C. Belém, L. de M. João et al.

Superpixel segmentation consists of partitioning images into regions composed of similar and connected pixels. Its methods have been widely used in many computer vision applications since it allows for reducing the workload, removing redundant information, and preserving regions with meaningful features. Due to the rapid progress in this area, the literature fails to catch up on more recent works among the compared ones and to categorize the methods according to all existing strategies. This work fills this gap by presenting a comprehensive review with new taxonomy for superpixel segmentation, in which methods are classified according to their processing steps and processing levels of image features. We revisit the recent and popular literature according to our taxonomy and evaluate 20 strategies based on nine criteria: connectivity, compactness, delineation, control over the number of superpixels, color homogeneity, robustness, running time, stability, and visual quality. Our experiments show the trends of each approach in pixel clustering and discuss individual trade-offs. Finally, we provide a new benchmark for superpixel assessment, available at https://github.com/IMScience-PPGINF-PucMinas/superpixel-benchmark.

7.3CVApr 7, 2022Code

Efficient Multiscale Object-based Superpixel Framework

Felipe Belém, Benjamin Perret, Jean Cousty et al.

Superpixel segmentation can be used as an intermediary step in many applications, often to improve object delineation and reduce computer workload. However, classical methods do not incorporate information about the desired object. Deep-learning-based approaches consider object information, but their delineation performance depends on data annotation. Additionally, the computational time of object-based methods is usually much higher than desired. In this work, we propose a novel superpixel framework, named Superpixels through Iterative CLEarcutting (SICLE), which exploits object information being able to generate a multiscale segmentation on-the-fly. SICLE starts off from seed oversampling and repeats optimal connectivity-based superpixel delineation and object-based seed removal until a desired number of superpixels is reached. It generalizes recent superpixel methods, surpassing them and other state-of-the-art approaches in efficiency and effectiveness according to multiple delineation metrics.

2.0LGFeb 6, 2023

Linking data separation, visual separation, and classifier performance using pseudo-labeling by contrastive learning

Bárbara Caroline Benato, Alexandre Xavier Falcão, Alexandru-Cristian Telea

Lacking supervised data is an issue while training deep neural networks (DNNs), mainly when considering medical and biological data where supervision is expensive. Recently, Embedded Pseudo-Labeling (EPL) addressed this problem by using a non-linear projection (t-SNE) from a feature space of the DNN to a 2D space, followed by semi-supervised label propagation using a connectivity-based method (OPFSemi). We argue that the performance of the final classifier depends on the data separation present in the latent space and visual separation present in the projection. We address this by first proposing to use contrastive learning to produce the latent space for EPL by two methods (SimCLR and SupCon) and by their combination, and secondly by showing, via an extensive set of experiments, the aforementioned correlations between data separation, visual separation, and classifier performance. We demonstrate our results by the classification of five real-world challenging image datasets of human intestinal parasites with only 1% supervised samples.

5.3IVJun 26, 2023

Building Flyweight FLIM-based CNNs with Adaptive Decoding for Object Detection

Leonardo de Melo Joao, Azael de Melo e Sousa, Bianca Martins dos Santos et al.

State-of-the-art (SOTA) object detection methods have succeeded in several applications at the price of relying on heavyweight neural networks, which makes them inefficient and inviable for many applications with computational resource constraints. This work presents a method to build a Convolutional Neural Network (CNN) layer by layer for object detection from user-drawn markers on discriminative regions of representative images. We address the detection of Schistosomiasis mansoni eggs in microscopy images of fecal samples, and the detection of ships in satellite images as application examples. We could create a flyweight CNN without backpropagation from very few input images. Our method explores a recent methodology, Feature Learning from Image Markers (FLIM), to build convolutional feature extractors (encoders) from marker pixels. We extend FLIM to include a single-layer adaptive decoder, whose weights vary with the input image -- a concept never explored in CNNs. Our CNN weighs thousands of times less than SOTA object detectors, being suitable for CPU execution and showing superior or equivalent performance to three methods in five measures.

2.6CVJan 12, 2021Code

Rethinking Interactive Image Segmentation: Feature Space Annotation

Jord{ã}o Bragantini, Alexandre X Falc{ã}o, Laurent Najman

Despite the progress of interactive image segmentation methods, high-quality pixel-level annotation is still time-consuming and laborious - a bottleneck for several deep learning applications. We take a step back to propose interactive and simultaneous segment annotation from multiple images guided by feature space projection. This strategy is in stark contrast to existing interactive segmentation methodologies, which perform annotation in the image domain. We show that feature space annotation achieves competitive results with state-of-the-art methods in foreground segmentation datasets: iCoSeg, DAVIS, and Rooftop. Moreover, in the semantic segmentation context, it achieves 91.5% accuracy in the Cityscapes dataset, being 74.75 times faster than the original annotation procedure. Further, our contribution sheds light on a novel direction for interactive image annotation that can be integrated with existing methodologies. The supplementary material presents video demonstrations. Code available at https://github.com/LIDS-UNICAMP/rethinking-interactive-image-segmentation.

1.5CVFeb 24

FLIM Networks with Bag of Feature Points

João Deltregia Martinelli, Marcelo Luis Rodrigues Filho, Felipe Crispim da Rocha Salvagnini et al.

Convolutional networks require extensive image annotation, which can be costly and time-consuming. Feature Learning from Image Markers (FLIM) tackles this challenge by estimating encoder filters (i.e., kernel weights) from user-drawn markers on discriminative regions of a few representative images without traditional optimization. Such an encoder combined with an adaptive decoder comprises a FLIM network fully trained without backpropagation. Prior research has demonstrated their effectiveness in Salient Object Detection (SOD), being significantly lighter than existing lightweight models. This study revisits FLIM SOD and introduces FLIM-Bag of Feature Points (FLIM-BoFP), a considerably faster filter estimation method. The previous approach, FLIM-Cluster, derives filters through patch clustering at each encoder's block, leading to computational overhead and reduced control over filter locations. FLIM-BoFP streamlines this process by performing a single clustering at the input block, creating a bag of feature points, and defining filters directly from mapped feature points across all blocks. The paper evaluates the benefits in efficiency, effectiveness, and generalization of FLIM-BoFP compared to FLIM-Cluster and other state-of-the-art baselines for parasite detection in optical microscopy images.

3.6CVApr 25, 2025

Co-Training with Active Contrastive Learning and Meta-Pseudo-Labeling on 2D Projections for Deep Semi-Supervised Learning

David Aparco-Cardenas, Jancarlo F. Gomes, Alexandre X. Falcão et al.

A major challenge that prevents the training of DL models is the limited availability of accurately labeled data. This shortcoming is highlighted in areas where data annotation becomes a time-consuming and error-prone task. In this regard, SSL tackles this challenge by capitalizing on scarce labeled and abundant unlabeled data; however, SoTA methods typically depend on pre-trained features and large validation sets to learn effective representations for classification tasks. In addition, the reduced set of labeled data is often randomly sampled, neglecting the selection of more informative samples. Here, we present active-DeepFA, a method that effectively combines CL, teacher-student-based meta-pseudo-labeling and AL to train non-pretrained CNN architectures for image classification in scenarios of scarcity of labeled and abundance of unlabeled data. It integrates DeepFA into a co-training setup that implements two cooperative networks to mitigate confirmation bias from pseudo-labels. The method starts with a reduced set of labeled samples by warming up the networks with supervised CL. Afterward and at regular epoch intervals, label propagation is performed on the 2D projections of the networks' deep features. Next, the most reliable pseudo-labels are exchanged between networks in a cross-training fashion, while the most meaningful samples are annotated and added into the labeled set. The networks independently minimize an objective loss function comprising supervised contrastive, supervised and semi-supervised loss components, enhancing the representations towards image classification. Our approach is evaluated on three challenging biological image datasets using only 5% of labeled samples, improving baselines and outperforming six other SoTA methods. In addition, it reduces annotation effort by achieving comparable results to those of its counterparts with only 3% of labeled data.

6.2CVApr 29, 2025Code

FLIM-based Salient Object Detection Networks with Adaptive Decoders

Gilson Junior Soares, Matheus Abrantes Cerqueira, Jancarlo F. Gomes et al.

Salient Object Detection (SOD) methods can locate objects that stand out in an image, assign higher values to their pixels in a saliency map, and binarize the map outputting a predicted segmentation mask. A recent tendency is to investigate pre-trained lightweight models rather than deep neural networks in SOD tasks, coping with applications under limited computational resources. In this context, we have investigated lightweight networks using a methodology named Feature Learning from Image Markers (FLIM), which assumes that the encoder's kernels can be estimated from marker pixels on discriminative regions of a few representative images. This work proposes flyweight networks, hundreds of times lighter than lightweight models, for SOD by combining a FLIM encoder with an adaptive decoder, whose weights are estimated for each input image by a given heuristic function. Such FLIM networks are trained from three to four representative images only and without backpropagation, making the models suitable for applications under labeled data constraints as well. We study five adaptive decoders; two of them are introduced here. Differently from the previous ones that rely on one neuron per pixel with shared weights, the heuristic functions of the new adaptive decoders estimate the weights of each neuron per pixel. We compare FLIM models with adaptive decoders for two challenging SOD tasks with three lightweight networks from the state-of-the-art, two FLIM networks with decoders trained by backpropagation, and one FLIM network whose labeled markers define the decoder's weights. The experiments demonstrate the advantages of the proposed networks over the baselines, revealing the importance of further investigating such methods in new applications.

6.2CVApr 15, 2025

Multi-level Cellular Automata for FLIM networks

Felipe Crispim Salvagnini, Jancarlo F. Gomes, Cid A. N. Santos et al.

The necessity of abundant annotated data and complex network architectures presents a significant challenge in deep-learning Salient Object Detection (deep SOD) and across the broader deep-learning landscape. This challenge is particularly acute in medical applications in developing countries with limited computational resources. Combining modern and classical techniques offers a path to maintaining competitive performance while enabling practical applications. Feature Learning from Image Markers (FLIM) methodology empowers experts to design convolutional encoders through user-drawn markers, with filters learned directly from these annotations. Recent findings demonstrate that coupling a FLIM encoder with an adaptive decoder creates a flyweight network suitable for SOD, requiring significantly fewer parameters than lightweight models and eliminating the need for backpropagation. Cellular Automata (CA) methods have proven successful in data-scarce scenarios but require proper initialization -- typically through user input, priors, or randomness. We propose a practical intersection of these approaches: using FLIM networks to initialize CA states with expert knowledge without requiring user interaction for each image. By decoding features from each level of a FLIM network, we can initialize multiple CAs simultaneously, creating a multi-level framework. Our method leverages the hierarchical knowledge encoded across different network layers, merging multiple saliency maps into a high-quality final output that functions as a CA ensemble. Benchmarks across two challenging medical datasets demonstrate the competitiveness of our multi-level CA approach compared to established models in the deep SOD literature.

3.6CVApr 15, 2025

Flyweight FLIM Networks for Salient Object Detection in Biomedical Images

Leonardo M. Joao, Jancarlo F. Gomes, Silvio J. F. Guimaraes et al.

Salient Object Detection (SOD) with deep learning often requires substantial computational resources and large annotated datasets, making it impractical for resource-constrained applications. Lightweight models address computational demands but typically strive in complex and scarce labeled-data scenarios. Feature Learning from Image Markers (FLIM) learns an encoder's convolutional kernels among image patches extracted from discriminative regions marked on a few representative images, dismissing large annotated datasets, pretraining, and backpropagation. Such a methodology exploits information redundancy commonly found in biomedical image applications. This study presents methods to learn dilated-separable convolutional kernels and multi-dilation layers without backpropagation for FLIM networks. It also proposes a novel network simplification method to reduce kernel redundancy and encoder size. By combining a FLIM encoder with an adaptive decoder, a concept recently introduced to estimate a pointwise convolution per image, this study presents very efficient (named flyweight) SOD models for biomedical images. Experimental results in challenging datasets demonstrate superior efficiency and effectiveness to lightweight models. By requiring significantly fewer parameters and floating-point operations, the results show competitive effectiveness to heavyweight models. These advances highlight the potential of FLIM networks for data-limited and resource-constrained applications with information redundancy.

6.5CVJun 5, 2024

Interactive Image Selection and Training for Brain Tumor Segmentation Network

Matheus A. Cerqueira, Flávia Sprenger, Bernardo C. A. Teixeira et al.

Medical image segmentation is a relevant problem, with deep learning being an exponent. However, the necessity of a high volume of fully annotated images for training massive models can be a problem, especially for applications whose images present a great diversity, such as brain tumors, which can occur in different sizes and shapes. In contrast, a recent methodology, Feature Learning from Image Markers (FLIM), has involved an expert in the learning loop, producing small networks that require few images to train the convolutional layers. In this work, We employ an interactive method for image selection and training based on FLIM, exploring the user's knowledge. The results demonstrated that with our methodology, we could choose a small set of images to train the encoder of a U-shaped network, obtaining performance equal to manual selection and even surpassing the same U-shaped network trained with backpropagation and all training images.

7.6CVMar 19, 2024

Building Brain Tumor Segmentation Networks with User-Assisted Filter Estimation and Selection

Matheus A. Cerqueira, Flávia Sprenger, Bernardo C. A. Teixeira et al.

Brain tumor image segmentation is a challenging research topic in which deep-learning models have presented the best results. However, the traditional way of training those models from many pre-annotated images leaves several unanswered questions. Hence methodologies, such as Feature Learning from Image Markers (FLIM), have involved an expert in the learning loop to reduce human effort in data annotation and build models sufficiently deep for a given problem. FLIM has been successfully used to create encoders, estimating the filters of all convolutional layers from patches centered at marker voxels. In this work, we present Multi-Step (MS) FLIM - a user-assisted approach to estimating and selecting the most relevant filters from multiple FLIM executions. MS-FLIM is used only for the first convolutional layer, and the results already indicate improvement over FLIM. For evaluation, we build a simple U-shaped encoder-decoder network, named sU-Net, for glioblastoma segmentation using T1Gd and FLAIR MRI scans, varying the encoder's training method, using FLIM, MS-FLIM, and backpropagation algorithm. Also, we compared these sU-Nets with two State-Of-The-Art (SOTA) deep-learning models using two datasets. The results show that the sU-Net based on MS-FLIM outperforms the other training methods and achieves effectiveness within the standard deviations of the SOTA models.

3.6IVFeb 7, 2024

Self-calibrated convolution towards glioma segmentation

Felipe C. R. Salvagnini, Gerson O. Barbosa, Alexandre X. Falcao et al.

Accurate brain tumor segmentation in the early stages of the disease is crucial for the treatment's effectiveness, avoiding exhaustive visual inspection of a qualified specialist on 3D MR brain images of multiple protocols (e.g., T1, T2, T2-FLAIR, T1-Gd). Several networks exist for Glioma segmentation, being nnU-Net one of the best. In this work, we evaluate self-calibrated convolutions in different parts of the nnU-Net network to demonstrate that self-calibrated modules in skip connections can significantly improve the enhanced-tumor and tumor-core segmentation accuracy while preserving the wholetumor segmentation accuracy.

1.4CVDec 1, 2021

Iterative Saliency Enhancement using Superpixel Similarity

Leonardo de Melo Joao, Alexandre Xavier Falcao

Saliency Object Detection (SOD) has several applications in image analysis. The methods have evolved from image-intrinsic to object-inspired (deep-learning-based) models. When a model fail, however, there is no alternative to enhance its saliency map. We fill this gap by introducing a hybrid approach, named \textit{Iterative Saliency Enhancement over Superpixel Similarity} (ISESS), that iteratively generates enhanced saliency maps by executing two operations alternately: object-based superpixel segmentation and superpixel-based saliency estimation -- cycling operations never exploited. ISESS estimates seeds for superpixel delineation from a given saliency map and defines superpixel queries in the foreground and background. A new saliency map results from color similarities between queries and superpixels at each iteration. The process repeats and, after a given number of iterations, the generated saliency maps are combined into one by cellular automata. Finally, the resulting map is merged with the initial one by the maximum bewteen their average values per superpixel. We demonstrate that our hybrid model can consistently outperform three state-of-the-art deep-learning-based methods on five image datasets.

6.1IVNov 16, 2021

CNN Filter Learning from Drawn Markers for the Detection of Suggestive Signs of COVID-19 in CT Images

Azael M. Sousa, Fabiano Reis, Rachel Zerbini et al.

Early detection of COVID-19 is vital to control its spread. Deep learning methods have been presented to detect suggestive signs of COVID-19 from chest CT images. However, due to the novelty of the disease, annotated volumetric data are scarce. Here we propose a method that does not require either large annotated datasets or backpropagation to estimate the filters of a convolutional neural network (CNN). For a few CT images, the user draws markers at representative normal and abnormal regions. The method generates a feature extractor composed of a sequence of convolutional layers, whose kernels are specialized in enhancing regions similar to the marked ones, and the decision layer of our CNN is a support vector machine. As we have no control over the CT image acquisition, we also propose an intensity standardization approach. Our method can achieve mean accuracy and kappa values of $0.97$ and $0.93$, respectively, on a dataset with 117 CT images extracted from different sites, surpassing its counterpart in all scenarios.

5.5LGSep 6, 2021

Iterative Pseudo-Labeling with Deep Feature Annotation and Confidence-Based Sampling

Barbara C Benato, Alexandru C Telea, Alexandre X Falcão

Training deep neural networks is challenging when large and annotated datasets are unavailable. Extensive manual annotation of data samples is time-consuming, expensive, and error-prone, notably when it needs to be done by experts. To address this issue, increased attention has been devoted to techniques that propagate uncertain labels (also called pseudo labels) to large amounts of unsupervised samples and use them for training the model. However, these techniques still need hundreds of supervised samples per class in the training set and a validation set with extra supervised samples to tune the model. We improve a recent iterative pseudo-labeling technique, Deep Feature Annotation (DeepFA), by selecting the most confident unsupervised samples to iteratively train a deep neural network. Our confidence-based sampling strategy relies on only dozens of annotated training samples per class with no validation set, considerably reducing user effort in data annotation. We first ascertain the best configuration for the baseline -- a self-trained deep neural network -- and then evaluate our confidence DeepFA for different confidence thresholds. Experiments on six datasets show that DeepFA already outperforms the self-trained baseline, but confidence DeepFA can considerably outperform the original DeepFA and the baseline.

4.5AIFeb 18, 2021

Hierarchical Learning Using Deep Optimum-Path Forest

Luis C. S. Afonso, Clayton R. Pereira, Silke A. T. Weber et al.

Bag-of-Visual Words (BoVW) and deep learning techniques have been widely used in several domains, which include computer-assisted medical diagnoses. In this work, we are interested in developing tools for the automatic identification of Parkinson's disease using machine learning and the concept of BoVW. The proposed approach concerns a hierarchical-based learning technique to design visual dictionaries through the Deep Optimum-Path Forest classifier. The proposed method was evaluated in six datasets derived from data collected from individuals when performing handwriting exams. Experimental results showed the potential of the technique, with robust achievements.

5.6CVJan 17, 2021

Intestinal Parasites Classification Using Deep Belief Networks

Mateus Roder, Leandro A. Passos, Luiz Carlos Felix Ribeiro et al.

Currently, approximately $4$ billion people are infected by intestinal parasites worldwide. Diseases caused by such infections constitute a public health problem in most tropical countries, leading to physical and mental disorders, and even death to children and immunodeficient individuals. Although subjected to high error rates, human visual inspection is still in charge of the vast majority of clinical diagnoses. In the past years, some works addressed intelligent computer-aided intestinal parasites classification, but they usually suffer from misclassification due to similarities between parasites and fecal impurities. In this paper, we introduce Deep Belief Networks to the context of automatic intestinal parasites classification. Experiments conducted over three datasets composed of eggs, larvae, and protozoa provided promising results, even considering unbalanced classes and also fecal impurities.

1.6LGJan 12, 2021

Convolutional Neural Network Simplification with Progressive Retraining

D. Osaku, J. F. Gomes, A. X. Falcão

Kernel pruning methods have been proposed to speed up, simplify, and improve explanation of convolutional neural network (CNN) models. However, the effectiveness of a simplified model is often below the original one. In this letter, we present new methods based on objective and subjective relevance criteria for kernel elimination in a layer-by-layer fashion. During the process, a CNN model is retrained only when the current layer is entirely simplified, by adjusting the weights from the next layer to the first one and preserving weights of subsequent layers not involved in the process. We call this strategy \emph{progressive retraining}, differently from kernel pruning methods that usually retrain the entire model after each simplification action -- e.g., the elimination of one or a few kernels. Our subjective relevance criterion exploits the ability of humans in recognizing visual patterns and improves the designer's understanding of the simplification process. The combination of suitable relevance criteria and progressive retraining shows that our methods can increase effectiveness with considerable model simplification. We also demonstrate that our methods can provide better results than two popular ones and another one from the state-of-the-art using four challenging image datasets.

2.6CVJan 7, 2021

Automated Diagnosis of Intestinal Parasites: A new hybrid approach and its benefits

D. Osaku, C. F. Cuba, Celso T. N. Suzuki et al.

Intestinal parasites are responsible for several diseases in human beings. In order to eliminate the error-prone visual analysis of optical microscopy slides, we have investigated automated, fast, and low-cost systems for the diagnosis of human intestinal parasites. In this work, we present a hybrid approach that combines the opinion of two decision-making systems with complementary properties: ($DS_1$) a simpler system based on very fast handcrafted image feature extraction and support vector machine classification and ($DS_2$) a more complex system based on a deep neural network, Vgg-16, for image feature extraction and classification. $DS_1$ is much faster than $DS_2$, but it is less accurate than $DS_2$. Fortunately, the errors of $DS_1$ are not the same of $DS_2$. During training, we use a validation set to learn the probabilities of misclassification by $DS_1$ on each class based on its confidence values. When $DS_1$ quickly classifies all images from a microscopy slide, the method selects a number of images with higher chances of misclassification for characterization and reclassification by $DS_2$. Our hybrid system can improve the overall effectiveness without compromising efficiency, being suitable for the clinical routine -- a strategy that might be suitable for other real applications. As demonstrated on large datasets, the proposed system can achieve, on average, 94.9%, 87.8%, and 92.5% of Cohen's Kappa on helminth eggs, helminth larvae, and protozoa cysts, respectively.

2.3CVDec 15, 2020

Convolutional Neural Networks from Image Markers

Barbara C. Benato, Italos E. de Souza, Felipe L. Galvão et al.

A technique named Feature Learning from Image Markers (FLIM) was recently proposed to estimate convolutional filters, with no backpropagation, from strokes drawn by a user on very few images (e.g., 1-3) per class, and demonstrated for coconut-tree image classification. This paper extends FLIM for fully connected layers and demonstrates it on different image classification problems. The work evaluates marker selection from multiple users and the impact of adding a fully connected layer. The results show that FLIM-based convolutional neural networks can outperform the same architecture trained from scratch by backpropagation.

7.2CVAug 8, 2020

Learning CNN filters from user-drawn image markers for coconut-tree image classification

Italos Estilon de Souza, Alexandre Xavier Falcão

Identifying species of trees in aerial images is essential for land-use classification, plantation monitoring, and impact assessment of natural disasters. The manual identification of trees in aerial images is tedious, costly, and error-prone, so automatic classification methods are necessary. Convolutional Neural Network (CNN) models have well succeeded in image classification applications from different domains. However, CNN models usually require intensive manual annotation to create large training sets. One may conceptually divide a CNN into convolutional layers for feature extraction and fully connected layers for feature space reduction and classification. We present a method that needs a minimal set of user-selected images to train the CNN's feature extractor, reducing the number of required images to train the fully connected layers. The method learns the filters of each convolutional layer from user-drawn markers in image regions that discriminate classes, allowing better user control and understanding of the training process. It does not rely on optimization based on backpropagation, and we demonstrate its advantages on the binary classification of coconut-tree aerial images against one of the most popular CNN models.

5.0CVAug 2, 2020

Semi-supervised deep learning based on label propagation in a 2D embedded space

Barbara Caroline Benato, Jancarlo Ferreira Gomes, Alexandru Cristian Telea et al.

While convolutional neural networks need large labeled sets for training images, expert human supervision of such datasets can be very laborious. Proposed solutions propagate labels from a small set of supervised images to a large set of unsupervised ones to obtain sufficient truly-and-artificially labeled samples to train a deep neural network model. Yet, such solutions need many supervised images for validation. We present a loop in which a deep neural network (VGG-16) is trained from a set with more correctly labeled samples along iterations, created by using t-SNE to project the features of its last max-pooling layer into a 2D embedded space in which labels are propagated using the Optimum-Path Forest semi-supervised classifier. As the labeled set improves along iterations, it improves the features of the neural network. We show that this can significantly improve classification results on test data (using only 1\% to 5\% of supervised samples) of three private challenging datasets and two public ones.

7.9LGJul 27, 2020

Semi-Automatic Data Annotation guided by Feature Space Projection

Barbara Caroline Benato, Jancarlo Ferreira Gomes, Alexandru Cristian Telea et al.

Data annotation using visual inspection (supervision) of each training sample can be laborious. Interactive solutions alleviate this by helping experts propagate labels from a few supervised samples to unlabeled ones based solely on the visual analysis of their feature space projection (with no further sample supervision). We present a semi-automatic data annotation approach based on suitable feature space projection and semi-supervised label estimation. We validate our method on the popular MNIST dataset and on images of human intestinal parasites with and without fecal impurities, a large and diverse dataset that makes classification very hard. We evaluate two approaches for semi-supervised learning from the latent and projection spaces, to choose the one that best reduces user annotation effort and also increases classification accuracy on unseen data. Our results demonstrate the added-value of visual analytics tools that combine complementary abilities of humans and machines for more effective machine learning.

1.2CVJun 30, 2020

ITSELF: Iterative Saliency Estimation fLexible Framework

Leonardo de Melo Joao, Felipe de Castro Belem, Alexandre Xavier Falcao

Saliency object detection estimates the objects that most stand out in an image. The available unsupervised saliency estimators rely on a pre-determined set of assumptions of how humans perceive saliency to create discriminating features. By fixing the pre-selected assumptions as an integral part of their models, these methods cannot be easily extended for specific settings and different image domains. We then propose a superpixel-based ITerative Saliency Estimation fLexible Framework (ITSELF) that allows any user-defined assumptions to be added to the model when required. Thanks to recent advancements in superpixel segmentation algorithms, saliency-maps can be used to improve superpixel delineation. By combining a saliency-based superpixel algorithm to a superpixel-based saliency estimator, we propose a novel saliency/superpixel self-improving loop to iteratively enhance saliency maps. We compare ITSELF to two state-of-the-art saliency estimators on five metrics and six datasets, four of which are composed of natural-images, and two of biomedical-images. Experiments show that our approach is more robust than the compared methods, presenting competitive results on natural-image datasets and outperforming them on biomedical-image datasets.

10.2CVJan 24, 2019

Correcting rural building annotations in OpenStreetMap using convolutional neural networks

John E. Vargas-Muñoz, Sylvain Lobry, Alexandre X. Falcão et al.

Rural building mapping is paramount to support demographic studies and plan actions in response to crisis that affect those areas. Rural building annotations exist in OpenStreetMap (OSM), but their quality and quantity are not sufficient for training models that can create accurate rural building maps. The problems with these annotations essentially fall into three categories: (i) most commonly, many annotations are geometrically misaligned with the updated imagery; (ii) some annotations do not correspond to buildings in the images (they are misannotations or the buildings have been destroyed); and (iii) some annotations are missing for buildings in the images (the buildings were never annotated or were built between subsequent image acquisitions). First, we propose a method based on Markov Random Field (MRF) to align the buildings with their annotations. The method maximizes the correlation between annotations and a building probability map while enforcing that nearby buildings have similar alignment vectors. Second, the annotations with no evidence in the building probability map are removed. Third, we present a method to detect non-annotated buildings with predefined shapes and add their annotation. The proposed methodology shows considerable improvement in accuracy of the OSM annotations for two regions of Tanzania and Zimbabwe, being more accurate than state-of-the-art baselines.

3.1CVOct 26, 2017

SEGMENT3D: A Web-based Application for Collaborative Segmentation of 3D images used in the Shoot Apical Meristem

Thiago V. Spina, Johannes Stegmaier, Alexandre X. Falcão et al.

The quantitative analysis of 3D confocal microscopy images of the shoot apical meristem helps understanding the growth process of some plants. Cell segmentation in these images is crucial for computational plant analysis and many automated methods have been proposed. However, variations in signal intensity across the image mitigate the effectiveness of those approaches with no easy way for user correction. We propose a web-based collaborative 3D image segmentation application, SEGMENT3D, to leverage automatic segmentation results. The image is divided into 3D tiles that can be either segmented interactively from scratch or corrected from a pre-existing segmentation. Individual segmentation results per tile are then automatically merged via consensus analysis and then stitched to complete the segmentation for the entire image stack. SEGMENT3D is a comprehensive application that can be applied to other 3D imaging modalities and general objects. It also provides an easy way to create supervised data to advance segmentation using machine learning models.

5.0CVOct 18, 2017

Cell Segmentation in 3D Confocal Images using Supervoxel Merge-Forests with CNN-based Hypothesis Selection

Johannes Stegmaier, Thiago V. Spina, Alexandre X. Falcão et al.

Automated segmentation approaches are crucial to quantitatively analyze large-scale 3D microscopy images. Particularly in deep tissue regions, automatic methods still fail to provide error-free segmentations. To improve the segmentation quality throughout imaged samples, we present a new supervoxel-based 3D segmentation approach that outperforms current methods and reduces the manual correction effort. The algorithm consists of gentle preprocessing and a conservative super-voxel generation method followed by supervoxel agglomeration based on local signal properties and a postprocessing step to fix under-segmentation errors using a Convolutional Neural Network. We validate the functionality of the algorithm on manually labeled 3D confocal images of the plant Arabidopis thaliana and compare the results to a state-of-the-art meristem segmentation algorithm.

3.0CVJun 10, 2016

FOMTrace: Interactive Video Segmentation By Image Graphs and Fuzzy Object Models

Thiago Vallin Spina, Alexandre Xavier Falcão

Common users have changed from mere consumers to active producers of multimedia data content. Video editing plays an important role in this scenario, calling for simple segmentation tools that can handle fast-moving and deformable video objects with possible occlusions, color similarities with the background, among other challenges. We present an interactive video segmentation method, named FOMTrace, which addresses the problem in an effective and efficient way. From a user-provided object mask in a first frame, the method performs semi-automatic video segmentation on a spatiotemporal superpixel-graph, and then estimates a Fuzzy Object Model (FOM), which refines segmentation of the second frame by constraining delineation on a pixel-graph within a region where the object's boundary is expected to be. The user can correct/accept the refined object mask in the second frame, which is then similarly used to improve the spatiotemporal video segmentation of the remaining frames. Both steps are repeated alternately, within interactive response times, until the segmentation refinement of the final frame is accepted by the user. Extensive experiments demonstrate FOMTrace's ability for tracing objects in comparison with state-of-the-art approaches for interactive video segmentation, supervised, and unsupervised object tracking.

5.5CVMay 29, 2013

Video Human Segmentation using Fuzzy Object Models and its Application to Body Pose Estimation of Toddlers for Behavior Studies

Thiago V. Spina, Mariano Tepper, Amy Esler et al.

Video object segmentation is a challenging problem due to the presence of deformable, connected, and articulated objects, intra- and inter-object occlusions, object motion, and poor lighting. Some of these challenges call for object models that can locate a desired object and separate it from its surrounding background, even when both share similar colors and textures. In this work, we extend a fuzzy object model, named cloud system model (CSM), to handle video segmentation, and evaluate it for body pose estimation of toddlers at risk of autism. CSM has been successfully used to model the parts of the brain (cerebrum, left and right brain hemispheres, and cerebellum) in order to automatically locate and separate them from each other, the connected brain stem, and the background in 3D MR-images. In our case, the objects are articulated parts (2D projections) of the human body, which can deform, cause self-occlusions, and move along the video. The proposed CSM extension handles articulation by connecting the individual clouds, body parts, of the system using a 2D stickman model. The stickman representation naturally allows us to extract 2D body pose measures of arm asymmetry patterns during unsupported gait of toddlers, a possible behavioral marker of autism. The results show that our method can provide insightful knowledge to assist the specialist's observations during real in-clinic assessments.