Wesley Nunes Gonçalves

h-index37

19papers

1,311citations

Novelty41%

AI Score33

Ranked #121,561 of 194,257 authors (top 63%)#40,408 in CV (top 68%)

19 Papers

28.4CVJun 29, 2023Code

The Segment Anything Model (SAM) for Remote Sensing Applications: From Zero to One Shot

Lucas Prado Osco, Qiusheng Wu, Eduardo Lopes de Lemos et al.

Segmentation is an essential step for remote sensing image processing. This study aims to advance the application of the Segment Anything Model (SAM), an innovative image segmentation model by Meta AI, in the field of remote sensing image analysis. SAM is known for its exceptional generalization capabilities and zero-shot learning, making it a promising approach to processing aerial and orbital images from diverse geographical contexts. Our exploration involved testing SAM across multi-scale datasets using various input prompts, such as bounding boxes, individual points, and text descriptors. To enhance the model's performance, we implemented a novel automated technique that combines a text-prompt-derived general example with one-shot training. This adjustment resulted in an improvement in accuracy, underscoring SAM's potential for deployment in remote sensing imagery and reducing the need for manual annotation. Despite the limitations encountered with lower spatial resolution images, SAM exhibits promising adaptability to remote sensing data analysis. We recommend future research to enhance the model's proficiency through integration with supplementary fine-tuning techniques and other networks. Furthermore, we provide the open-source code of our modifications on online repositories, encouraging further and broader adaptations of SAM to the remote sensing domain.

9.1CVMar 8, 2023Code

RADAM: Texture Recognition through Randomized Aggregated Encoding of Deep Activation Maps

Leonardo Scabini, Kallil M. Zielinski, Lucas C. Ribas et al.

Texture analysis is a classical yet challenging task in computer vision for which deep neural networks are actively being applied. Most approaches are based on building feature aggregation modules around a pre-trained backbone and then fine-tuning the new architecture on specific texture recognition tasks. Here we propose a new method named \textbf{R}andom encoding of \textbf{A}ggregated \textbf{D}eep \textbf{A}ctivation \textbf{M}aps (RADAM) which extracts rich texture representations without ever changing the backbone. The technique consists of encoding the output at different depths of a pre-trained deep convolutional network using a Randomized Autoencoder (RAE). The RAE is trained locally to each image using a closed-form solution, and its decoder weights are used to compose a 1-dimensional texture representation that is fed into a linear SVM. This means that no fine-tuning or backpropagation is needed. We explore RADAM on several texture benchmarks and achieve state-of-the-art results with different computational budgets. Our results suggest that pre-trained backbones may not require additional fine-tuning for texture recognition if their learned representations are better encoded.

10.4CVApr 25, 2023

The Potential of Visual ChatGPT For Remote Sensing

Lucas Prado Osco, Eduardo Lopes de Lemos, Wesley Nunes Gonçalves et al.

Recent advancements in Natural Language Processing (NLP), particularly in Large Language Models (LLMs), associated with deep learning-based computer vision techniques, have shown substantial potential for automating a variety of tasks. One notable model is Visual ChatGPT, which combines ChatGPT's LLM capabilities with visual computation to enable effective image analysis. The model's ability to process images based on textual inputs can revolutionize diverse fields. However, its application in the remote sensing domain remains unexplored. This is the first paper to examine the potential of Visual ChatGPT, a cutting-edge LLM founded on the GPT architecture, to tackle the aspects of image processing related to the remote sensing domain. Among its current capabilities, Visual ChatGPT can generate textual descriptions of images, perform canny edge and straight line detection, and conduct image segmentation. These offer valuable insights into image content and facilitate the interpretation and extraction of information. By exploring the applicability of these techniques within publicly available datasets of satellite images, we demonstrate the current model's limitations in dealing with remote sensing images, highlighting its challenges and future prospects. Although still in early development, we believe that the combination of LLMs and visual models holds a significant potential to transform remote sensing image processing, creating accessible and practical application opportunities in the field.

3.6CVMay 21, 2025

Data Augmentation and Resolution Enhancement using GANs and Diffusion Models for Tree Segmentation

Alessandro dos Santos Ferreira, Ana Paula Marques Ramos, José Marcato Junior et al.

Urban forests play a key role in enhancing environmental quality and supporting biodiversity in cities. Mapping and monitoring these green spaces are crucial for urban planning and conservation, yet accurately detecting trees is challenging due to complex landscapes and the variability in image resolution caused by different satellite sensors or UAV flight altitudes. While deep learning architectures have shown promise in addressing these challenges, their effectiveness remains strongly dependent on the availability of large and manually labeled datasets, which are often expensive and difficult to obtain in sufficient quantity. In this work, we propose a novel pipeline that integrates domain adaptation with GANs and Diffusion models to enhance the quality of low-resolution aerial images. Our proposed pipeline enhances low-resolution imagery while preserving semantic content, enabling effective tree segmentation without requiring large volumes of manually annotated data. Leveraging models such as pix2pix, Real-ESRGAN, Latent Diffusion, and Stable Diffusion, we generate realistic and structurally consistent synthetic samples that expand the training dataset and unify scale across domains. This approach not only improves the robustness of segmentation models across different acquisition conditions but also provides a scalable and replicable solution for remote sensing scenarios with scarce annotation resources. Experimental results demonstrated an improvement of over 50% in IoU for low-resolution images, highlighting the effectiveness of our method compared to traditional pipelines.

3.9CVMay 4, 2023

MTLSegFormer: Multi-task Learning with Transformers for Semantic Segmentation in Precision Agriculture

Diogo Nunes Goncalves, Jose Marcato Junior, Pedro Zamboni et al.

Multi-task learning has proven to be effective in improving the performance of correlated tasks. Most of the existing methods use a backbone to extract initial features with independent branches for each task, and the exchange of information between the branches usually occurs through the concatenation or sum of the feature maps of the branches. However, this type of information exchange does not directly consider the local characteristics of the image nor the level of importance or correlation between the tasks. In this paper, we propose a semantic segmentation method, MTLSegFormer, which combines multi-task learning and attention mechanisms. After the backbone feature extraction, two feature maps are learned for each task. The first map is proposed to learn features related to its task, while the second map is obtained by applying learned visual attention to locally re-weigh the feature maps of the other tasks. In this way, weights are assigned to local regions of the image of other tasks that have greater importance for the specific task. Finally, the two maps are combined and used to solve a task. We tested the performance in two challenging problems with correlated tasks and observed a significant improvement in accuracy, mainly in tasks with high dependence on the others.

9.4CVFeb 8, 2021

Semantic Segmentation with Labeling Uncertainty and Class Imbalance

Patrik Olã Bressan, José Marcato Junior, José Augusto Correa Martins et al.

Recently, methods based on Convolutional Neural Networks (CNN) achieved impressive success in semantic segmentation tasks. However, challenges such as the class imbalance and the uncertainty in the pixel-labeling process are not completely addressed. As such, we present a new approach that calculates a weight for each pixel considering its class and uncertainty during the labeling process. The pixel-wise weights are used during training to increase or decrease the importance of the pixels. Experimental results show that the proposed approach leads to significant improvements in three challenging segmentation tasks in comparison to baseline methods. It was also proved to be more invariant to noise. The approach presented here may be used within a wide range of semantic segmentation methods to improve their robustness.

1.4CVFeb 8, 2021

Counting and Locating High-Density Objects Using Convolutional Neural Network

Mauro dos Santos de Arruda, Lucas Prado Osco, Plabiany Rodrigo Acosta et al.

This paper presents a Convolutional Neural Network (CNN) approach for counting and locating objects in high-density imagery. To the best of our knowledge, this is the first object counting and locating method based on a feature map enhancement and a Multi-Stage Refinement of the confidence map. The proposed method was evaluated in two counting datasets: tree and car. For the tree dataset, our method returned a mean absolute error (MAE) of 2.05, a root-mean-squared error (RMSE) of 2.87 and a coefficient of determination (R$^2$) of 0.986. For the car dataset (CARPK and PUCPR+), our method was superior to state-of-the-art methods. In the these datasets, our approach achieved an MAE of 4.45 and 3.16, an RMSE of 6.18 and 4.39, and an R$^2$ of 0.975 and 0.999, respectively. The proposed method is suitable for dealing with high object-density, returning a state-of-the-art performance for counting and locating objects.

2.6CVFeb 5, 2021

A Deep Learning Approach Based on Graphs to Detect Plantation Lines

Diogo Nunes Gonçalves, Mauro dos Santos de Arruda, Hemerson Pistori et al.

Deep learning-based networks are among the most prominent methods to learn linear patterns and extract this type of information from diverse imagery conditions. Here, we propose a deep learning approach based on graphs to detect plantation lines in UAV-based RGB imagery presenting a challenging scenario containing spaced plants. The first module of our method extracts a feature map throughout the backbone, which consists of the initial layers of the VGG16. This feature map is used as an input to the Knowledge Estimation Module (KEM), organized in three concatenated branches for detecting 1) the plant positions, 2) the plantation lines, and 3) for the displacement vectors between the plants. A graph modeling is applied considering each plant position on the image as vertices, and edges are formed between two vertices (i.e. plants). Finally, the edge is classified as pertaining to a certain plantation line based on three probabilities (higher than 0.5): i) in visual features obtained from the backbone; ii) a chance that the edge pixels belong to a line, from the KEM step; and iii) an alignment of the displacement vectors with the edge, also from KEM. Experiments were conducted in corn plantations with different growth stages and patterns with aerial RGB imagery. A total of 564 patches with 256 x 256 pixels were used and randomly divided into training, validation, and testing sets in a proportion of 60\%, 20\%, and 20\%, respectively. The proposed method was compared against state-of-the-art deep learning methods, and achieved superior performance with a significant margin, returning precision, recall, and F1-score of 98.7\%, 91.9\%, and 95.1\%, respectively. This approach is useful in extracting lines with spaced plantation patterns and could be implemented in scenarios where plantation gaps occur, generating lines with few-to-none interruptions.

16.6CVJan 22, 2021

A Review on Deep Learning in UAV Remote Sensing

Lucas Prado Osco, José Marcato Junior, Ana Paula Marques Ramos et al.

Deep Neural Networks (DNNs) learn representation from data with an impressive capability, and brought important breakthroughs for processing images, time-series, natural language, audio, video, and many others. In the remote sensing field, surveys and literature revisions specifically involving DNNs algorithms' applications have been conducted in an attempt to summarize the amount of information produced in its subfields. Recently, Unmanned Aerial Vehicles (UAV) based applications have dominated aerial sensing research. However, a literature revision that combines both "deep learning" and "UAV remote sensing" thematics has not yet been conducted. The motivation for our work was to present a comprehensive review of the fundamentals of Deep Learning (DL) applied in UAV-based imagery. We focused mainly on describing classification and regression techniques used in recent applications with UAV-acquired data. For that, a total of 232 papers published in international scientific journal databases was examined. We gathered the published material and evaluated their characteristics regarding application, sensor, and technique used. We relate how DL presents promising results and has the potential for processing tasks associated with UAV-based image data. Lastly, we project future perspectives, commentating on prominent DL paths to be explored in the UAV remote sensing field. Our revision consists of a friendly-approach to introduce, commentate, and summarize the state-of-the-art in UAV-based image applications with DNNs algorithms in diverse subfields of remote sensing, grouping it in the environmental, urban, and agricultural contexts.

8.5CVDec 31, 2020

A CNN Approach to Simultaneously Count Plants and Detect Plantation-Rows from UAV Imagery

Lucas Prado Osco, Mauro dos Santos de Arruda, Diogo Nunes Gonçalves et al.

In this paper, we propose a novel deep learning method based on a Convolutional Neural Network (CNN) that simultaneously detects and geolocates plantation-rows while counting its plants considering highly-dense plantation configurations. The experimental setup was evaluated in a cornfield with different growth stages and in a Citrus orchard. Both datasets characterize different plant density scenarios, locations, types of crops, sensors, and dates. A two-branch architecture was implemented in our CNN method, where the information obtained within the plantation-row is updated into the plant detection branch and retro-feed to the row branch; which are then refined by a Multi-Stage Refinement method. In the corn plantation datasets (with both growth phases, young and mature), our approach returned a mean absolute error (MAE) of 6.224 plants per image patch, a mean relative error (MRE) of 0.1038, precision and recall values of 0.856, and 0.905, respectively, and an F-measure equal to 0.876. These results were superior to the results from other deep networks (HRNet, Faster R-CNN, and RetinaNet) evaluated with the same task and dataset. For the plantation-row detection, our approach returned precision, recall, and F-measure scores of 0.913, 0.941, and 0.925, respectively. To test the robustness of our model with a different type of agriculture, we performed the same task in the citrus orchard dataset. It returned an MAE equal to 1.409 citrus-trees per patch, MRE of 0.0615, precision of 0.922, recall of 0.911, and F-measure of 0.965. For citrus plantation-row detection, our approach resulted in precision, recall, and F-measure scores equal to 0.965, 0.970, and 0.964, respectively. The proposed method achieved state-of-the-art performance for counting and geolocating plants and plant-rows in UAV images from different types of crops.

0.9CVJun 27, 2018

Dynamic texture analysis with diffusion in networks

Lucas C. Ribas, Wesley N. Goncalves, Odemir M. Bruno

Dynamic texture is a field of research that has gained considerable interest from computer vision community due to the explosive growth of multimedia databases. In addition, dynamic texture is present in a wide range of videos, which makes it very important in expert systems based on videos such as medical systems, traffic monitoring systems, forest fire detection system, among others. In this paper, a new method for dynamic texture characterization based on diffusion in directed networks is proposed. The dynamic texture is modeled as a directed network. The method consists in the analysis of the dynamic of this network after a series of graph cut transformations based on the edge weights. For each network transformation, the activity for each vertex is estimated. The activity is the relative frequency that one vertex is visited by random walks in balance. Then, texture descriptor is constructed by concatenating the activity histograms. The main contributions of this paper are the use of directed network modeling and diffusion in network to dynamic texture characterization. These tend to provide better performance in dynamic textures classification. Experiments with rotation and interference of the motion pattern were conducted in order to demonstrate the robustness of the method. The proposed approach is compared to other dynamic texture methods on two very well know dynamic texture database and on traffic condition classification, and outperform in most of the cases.

4.6CVApr 2, 2018

Multilayer Complex Network Descriptors for Color-Texture Characterization

Leonardo F S Scabini, Rayner H M Condori, Wesley N Gonçalves et al.

A new method based on complex networks is proposed for color-texture analysis. The proposal consists on modeling the image as a multilayer complex network where each color channel is a layer, and each pixel (in each color channel) is represented as a network vertex. The network dynamic evolution is accessed using a set of modeling parameters (radii and thresholds), and new characterization techniques are introduced to capt information regarding within and between color channel spatial interaction. An automatic and adaptive approach for threshold selection is also proposed. We conduct classification experiments on 5 well-known datasets: Vistex, Usptex, Outex13, CURet and MBT. Results among various literature methods are compared, including deep convolutional neural networks with pre-trained architectures. The proposed method presented the highest overall performance over the 5 datasets, with 97.7 of mean accuracy against 97.0 achieved by the ResNet convolutional neural network with 50 layers.

1.1CVNov 25, 2016

Texture analysis using deterministic partially self-avoiding walk with thresholds

Lucas Correia Ribas, Wesley Nunes Gonçalves, Odemir Martinez Bruno

In this paper, we propose a new texture analysis method using the deterministic partially self-avoiding walk performed on maps modified with thresholds. In this method, two pixels of the map are neighbors if the Euclidean distance between them is less than $\sqrt{2}$ and the weight (difference between its intensities) is less than a given threshold. The maps obtained by using different thresholds highlight several properties of the image that are extracted by the deterministic walk. To compose the feature vector, deterministic walks are performed with different thresholds and its statistics are concatenated. Thus, this approach can be considered as a multi-scale analysis. We validate our method on the Brodatz database, which is very well known public image database and widely used by texture analysis methods. Experimental results indicate that the proposed method presents a good texture discrimination, overcoming traditional texture methods.

2.1CVSep 26, 2016

BioLeaf: a professional mobile application to measure foliar damage caused by insect herbivory

Bruno Machado, Jonatan Orue, Mauro Arruda et al.

Soybean is one of the ten greatest crops in the world, answering for billion-dollar businesses every year. This crop suffers from insect herbivory that costs millions from producers. Hence, constant monitoring of the crop foliar damage is necessary to guide the application of insecticides. However, current methods to measure foliar damage are expensive and dependent on laboratory facilities, in some cases, depending on complex devices. To cope with these shortcomings, we introduce an image processing methodology to measure the foliar damage in soybean leaves. We developed a non-destructive imaging method based on two techniques, Otsu segmentation and Bezier curves, to estimate the foliar loss in leaves with or without border damage. We instantiate our methodology in a mobile application named BioLeaf, which is freely distributed for smartphone users. We experimented with real-world leaves collected from a soybean crop in Brazil. Our results demonstrated that BioLeaf achieves foliar damage quantification with precision comparable to that of human specialists. With these results, our proposal might assist soybean producers, reducing the time to measure foliar damage, reducing analytical costs, and defining a commodity application that is applicable not only to soy, but also to different crops such as cotton, bean, potato, coffee, and vegetables.

2.3DATA-ANNov 21, 2013

Texture descriptor combining fractal dimension and artificial crawlers

Wesley Nunes Gonçalves, Bruno Brandoli Machado, Odemir Martinez Bruno

Texture is an important visual attribute used to describe images. There are many methods available for texture analysis. However, they do not capture the details richness of the image surface. In this paper, we propose a new method to describe textures using the artificial crawler model. This model assumes that each agent can interact with the environment and each other. Since this swarm system alone does not achieve a good discrimination, we developed a new method to increase the discriminatory power of artificial crawlers, together with the fractal dimension theory. Here, we estimated the fractal dimension by the Bouligand-Minkowski method due to its precision in quantifying structural properties of images. We validate our method on two texture datasets and the experimental results reveal that our method leads to highly discriminative textural features. The results indicate that our method can be used in different texture applications.

3.3CEMay 10, 2013

Multi-q Pattern Classification of Polarization Curves

Ricardo Fabbri, Ivan N. Bastos, Francisco D. Moura Neto et al.

Several experimental measurements are expressed in the form of one-dimensional profiles, for which there is a scarcity of methodologies able to classify the pertinence of a given result to a specific group. The polarization curves that evaluate the corrosion kinetics of electrodes in corrosive media are an application where the behavior is chiefly analyzed from profiles. Polarization curves are indeed a classic method to determine the global kinetics of metallic electrodes, but the strong nonlinearity from different metals and alloys can overlap and the discrimination becomes a challenging problem. Moreover, even finding a typical curve from replicated tests requires subjective judgement. In this paper we used the so-called multi-q approach based on the Tsallis statistics in a classification engine to separate multiple polarization curve profiles of two stainless steels. We collected 48 experimental polarization curves in aqueous chloride medium of two stainless steel types, with different resistance against localized corrosion. Multi-q pattern analysis was then carried out on a wide potential range, from cathodic up to anodic regions. An excellent classification rate was obtained, at a success rate of 90%, 80%, and 83% for low (cathodic), high (anodic), and both potential ranges, respectively, using only 2% of the original profile data. These results show the potential of the proposed approach towards efficient, robust, systematic and automatic classification of highly non-linear profile curves.

1.5CVJan 19, 2012

Image decomposition with anisotropic diffusion applied to leaf-texture analysis

Bruno Brandoli Machado, Wesley Nunes Gonçalves, Odemir Martinez Bruno

Texture analysis is an important field of investigation that has received a great deal of interest from computer vision community. In this paper, we propose a novel approach for texture modeling based on partial differential equation (PDE). Each image $f$ is decomposed into a family of derived sub-images. $f$ is split into the $u$ component, obtained with anisotropic diffusion, and the $v$ component which is calculated by the difference between the original image and the $u$ component. After enhancing the texture attribute $v$ of the image, Gabor features are computed as descriptors. We validate the proposed approach on two texture datasets with high variability. We also evaluate our approach on an important real-world application: leaf-texture analysis. Experimental results indicate that our approach can be used to produce higher classification rates and can be successfully employed for different texture applications.

7.7CVJan 17, 2012

Spatiotemporal Gabor filters: a new method for dynamic texture recognition

Wesley Nunes Gonçalves, Bruno Brandoli Machado, Odemir Martinez Bruno

This paper presents a new method for dynamic texture recognition based on spatiotemporal Gabor filters. Dynamic textures have emerged as a new field of investigation that extends the concept of self-similarity of texture image to the spatiotemporal domain. To model a dynamic texture, we convolve the sequence of images to a bank of spatiotemporal Gabor filters. For each response, a feature vector is built by calculating the energy statistic. As far as the authors know, this paper is the first to report an effective method for dynamic texture recognition using spatiotemporal Gabor filters. We evaluate the proposed method on two challenging databases and the experimental results indicate that the proposed method is a robust approach for dynamic texture recognition.

2.8CVJan 15, 2012

Automatic system for counting cells with elliptical shape

Wesley Nunes Gonçalves, Odemir Martinez Bruno

This paper presents a new method for automatic quantification of ellipse-like cells in images, an important and challenging problem that has been studied by the computer vision community. The proposed method can be described by two main steps. Initially, image segmentation based on the k-means algorithm is performed to separate different types of cells from the background. Then, a robust and efficient strategy is performed on the blob contour for touching cells splitting. Due to the contour processing, the method achieves excellent results of detection compared to manual detection performed by specialists.