Henry Medeiros

19papers

575citations

Novelty48%

AI Score31

Ranked #144,197 of 205,806 authors (top 70%)#44,688 in CV (top 76%)

19 Papers

CVSep 10, 2022

Self-supervised Learning for Panoptic Segmentation of Multiple Fruit Flower Species

Abubakar Siddique, Amy Tabb, Henry Medeiros

Convolutional neural networks trained using manually generated labels are commonly used for semantic or instance segmentation. In precision agriculture, automated flower detection methods use supervised models and post-processing techniques that may not perform consistently as the appearance of the flowers and the data acquisition conditions vary. We propose a self-supervised learning strategy to enhance the sensitivity of segmentation models to different flower species using automatically generated pseudo-labels. We employ a data augmentation and refinement approach to improve the accuracy of the model predictions. The augmented semantic predictions are then converted to panoptic pseudo-labels to iteratively train a multi-task model. The self-supervised model predictions can be refined with existing post-processing approaches to further improve their accuracy. An evaluation on a multi-species fruit tree flower dataset demonstrates that our method outperforms state-of-the-art models without computationally expensive post-processing steps, providing a new baseline for flower detection applications.

CVDec 31, 2022

Tracking Passengers and Baggage Items using Multiple Overhead Cameras at Security Checkpoints

Abubakar Siddique, Henry Medeiros

We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios where targets correspond to passengers and their baggage items. We propose a Self-Supervised Learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images. Our SSL approach improves object detection by employing a test-time data augmentation and a regression-based, rotation-invariant pseudo-label refinement technique. Our pseudo-label generation method provides multiple geometrically-transformed images as inputs to a Convolutional Neural Network (CNN), regresses the augmented detections generated by the network to reduce localization errors, and then clusters them using the mean-shift algorithm. The self-supervised detector model is used in a single-camera tracking algorithm to generate temporal identifiers for the targets. Our method also incorporates a multi-view trajectory association mechanism to maintain consistent temporal identifiers as passengers travel across camera views. An evaluation of detection, tracking, and association performances on videos obtained from multiple overhead cameras in a realistic airport checkpoint environment demonstrates the effectiveness of the proposed approach. Our results show that self-supervision improves object detection accuracy by up to $42\%$ without increasing the inference time of the model. Our multi-camera association method achieves up to $89\%$ multi-object tracking accuracy with an average computation time of less than $15$ ms.

CVMar 15, 2019Code

Multi-camera calibration with pattern rigs, including for non-overlapping cameras: CALICO

Amy Tabb, Henry Medeiros, Mitchell J. Feldmann et al.

This paper describes CALICO, a method for multi-camera calibration suitable for challenging contexts: stationary and mobile multi-camera systems, cameras without overlapping fields of view, and non-synchronized cameras. Recent approaches are roughly divided into infrastructure- and pattern-based. Infrastructure-based approaches use the scene's features to calibrate, while pattern-based approaches use calibration patterns. Infrastructure-based approaches are not suitable for stationary camera systems, and pattern-based approaches may constrain camera placement because shared fields of view or extremely large patterns are required. CALICO is a pattern-based approach, where the multi-calibration problem is formulated using rigidity constraints between patterns and cameras. We use a {\it pattern rig}: several patterns rigidly attached to each other or some structure. We express the calibration problem as that of algebraic and reprojection error minimization problems. Simulated and real experiments demonstrate the method in a variety of settings. CALICO compared favorably to Kalibr. Mean reconstruction accuracy error was $\le 0.71$ mm for real camera rigs, and $\le 1.11$ for simulated camera rigs. Code and data releases are available at \cite{tabb_amy_2019_3520866} and \url{https://github.com/amy-tabb/calico}.

CVFeb 18, 2019Code

FreeLabel: A Publicly Available Annotation Tool based on Freehand Traces

Philipe A. Dias, Zhou Shen, Amy Tabb et al.

Large-scale annotation of image segmentation datasets is often prohibitively expensive, as it usually requires a huge number of worker hours to obtain high-quality results. Abundant and reliable data has been, however, crucial for the advances on image understanding tasks achieved by deep learning models. In this paper, we introduce FreeLabel, an intuitive open-source web interface that allows users to obtain high-quality segmentation masks with just a few freehand scribbles, in a matter of seconds. The efficacy of FreeLabel is quantitatively demonstrated by experimental results on the PASCAL dataset as well as on a dataset from the agricultural domain. Designed to benefit the computer vision community, FreeLabel can be used for both crowdsourced or private annotation and has a modular structure that can be easily adapted for any image dataset.

CVAug 17, 2024

Multi-Camera Multi-Person Association using Transformer-Based Dense Pixel Correspondence Estimation and Detection-Based Masking

Daniel Kathein, Byron Hernandez, Henry Medeiros

Multi-camera Association (MCA) is the task of identifying objects and individuals across camera views and is an active research topic, given its numerous applications across robotics, surveillance, and agriculture. We investigate a novel multi-camera multi-target association algorithm based on dense pixel correspondence estimation with a Transformer-based architecture and underlying detection-based masking. After the algorithm generates a set of corresponding keypoints and their respective confidence levels between every pair of detections in the camera views are computed, an affinity matrix is determined containing the probabilities of matches between each pair. Finally, the Hungarian algorithm is applied to generate an optimal assignment matrix with all the predicted associations between the camera views. Our method is evaluated on the WILDTRACK Seven-Camera HD Dataset, a high-resolution dataset containing footage of walking pedestrians as well as precise annotations and camera calibrations. Our results conclude that the algorithm performs exceptionally well associating pedestrians on camera pairs that are positioned close to each other and observe the scene from similar perspectives. On camera pairs with orientations that are drastically different in distance or angle, there is still significant room for improvement.

CVJul 7, 2021

Deep Convolutional Correlation Iterative Particle Filter for Visual Tracking

Reza Jalil Mozhdehi, Henry Medeiros

This work proposes a novel framework for visual tracking based on the integration of an iterative particle filter, a deep convolutional neural network, and a correlation filter. The iterative particle filter enables the particles to correct themselves and converge to the correct target position. We employ a novel strategy to assess the likelihood of the particles after the iterations by applying K-means clustering. Our approach ensures a consistent support for the posterior distribution. Thus, we do not need to perform resampling at every video frame, improving the utilization of prior distribution information. Experimental results on two different benchmark datasets show that our tracker performs favorably against state-of-the-art methods.

CVJul 15, 2020

Tracking Passengers and Baggage Items Using Multiple Overhead Cameras at Security Checkpoints

Abubakar Siddique, Henry Medeiros

We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios where targets correspond to passengers and their baggage items. We propose a self-supervised learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images. Our SSL approach improves object detection by employing a test-time data augmentation and a regression-based, rotation-invariant pseudo-label refinement technique. Our pseudo-label generation method provides multiple geometrically transformed images as inputs to a convolutional neural network (CNN), regresses the augmented detections generated by the network to reduce localization errors, and then clusters them using the mean-shift algorithm. The self-supervised detector model is used in a single-camera tracking algorithm to generate temporal identifiers for the targets. Our method also incorporates a multiview trajectory association mechanism to maintain consistent temporal identifiers as passengers travel across camera views. An evaluation of detection, tracking, and association performances on videos obtained from multiple overhead cameras in a realistic airport checkpoint environment demonstrates the effectiveness of the proposed approach. Our results show that self-supervision improves object detection accuracy by up to 42% without increasing the inference time of the model. Our multicamera association method achieves up to 89% multiobject tracking accuracy with an average computation time of less than 15 ms.

CVJul 14, 2020

Unsupervised Spatio-temporal Latent Feature Clustering for Multiple-object Tracking and Segmentation

Abubakar Siddique, Reza Jalil Mozhdehi, Henry Medeiros

Assigning consistent temporal identifiers to multiple moving objects in a video sequence is a challenging problem. A solution to that problem would have immediate ramifications in multiple object tracking and segmentation problems. We propose a strategy that treats the temporal identification task as a spatio-temporal clustering problem. We propose an unsupervised learning approach using a convolutional and fully connected autoencoder, which we call deep heterogeneous autoencoder, to learn discriminative features from segmentation masks and detection bounding boxes. We extract masks and their corresponding bounding boxes from a pretrained instance segmentation network and train the autoencoders jointly using task-dependent uncertainty weights to generate common latent features. We then construct constraints graphs that encourage associations among objects that satisfy a set of known temporal conditions. The feature vectors and the constraints graphs are then provided to the kmeans clustering algorithm to separate the corresponding data points in the latent space. We evaluate the performance of our method using challenging synthetic and real-world multiple-object video datasets. Our results show that our technique outperforms several state-of-the-art methods.

CVJun 11, 2020

Deep Convolutional Likelihood Particle Filter for Visual Tracking

Reza Jalil Mozhdehi, Henry Medeiros

We propose a novel particle filter for convolutional-correlation visual trackers. Our method uses correlation response maps to estimate likelihood distributions and employs these likelihoods as proposal densities to sample particles. Likelihood distributions are more reliable than proposal densities based on target transition distributions because correlation response maps provide additional information regarding the target's location. Additionally, our particle filter searches for multiple modes in the likelihood distribution, which improves performance in target occlusion scenarios while decreasing computational costs by more efficiently sampling particles. In other challenging scenarios such as those involving motion blur, where only one mode is present but a larger search area may be necessary, our particle filter allows for the variance of the likelihood distribution to increase. We tested our algorithm on the Visual Tracker Benchmark v1.1 (OTB100) and our experimental results demonstrate that our framework outperforms state-of-the-art methods.

CVMay 12, 2020

Probabilistic Semantic Segmentation Refinement by Monte Carlo Region Growing

Philipe A. Dias, Henry Medeiros

Semantic segmentation with fine-grained pixel-level accuracy is a fundamental component of a variety of computer vision applications. However, despite the large improvements provided by recent advances in the architectures of convolutional neural networks, segmentations provided by modern state-of-the-art methods still show limited boundary adherence. We introduce a fully unsupervised post-processing algorithm that exploits Monte Carlo sampling and pixel similarities to propagate high-confidence pixel labels into regions of low-confidence classification. Our algorithm, which we call probabilistic Region Growing Refinement (pRGR), is based on a rigorous mathematical foundation in which clusters are modelled as multivariate normally distributed sets of pixels. Exploiting concepts of Bayesian estimation and variance reduction techniques, pRGR performs multiple refinement iterations at varied receptive fields sizes, while updating cluster statistics to adapt to local image features. Experiments using multiple modern semantic segmentation networks and benchmark datasets demonstrate the effectiveness of our approach for the refinement of segmentation predictions at different levels of coarseness, as well as the suitability of the variance estimates obtained in the Monte Carlo iterations as uncertainty measures that are highly correlated with segmentation accuracy.

CVSep 19, 2019

Gaze Estimation for Assisted Living Environments

Philipe A. Dias, Damiano Malafronte, Henry Medeiros et al.

Effective assisted living environments must be able to perform inferences on how their occupants interact with one another as well as with surrounding objects. To accomplish this goal using a vision-based automated approach, multiple tasks such as pose estimation, object segmentation and gaze estimation must be addressed. Gaze direction in particular provides some of the strongest indications of how a person interacts with the environment. In this paper, we propose a simple neural network regressor that estimates the gaze direction of individuals in a multi-camera assisted living scenario, relying only on the relative positions of facial keypoints collected from a single pose estimation model. To handle cases of keypoint occlusion, our model exploits a novel confidence gated unit in its input layer. In addition to the gaze direction, our model also outputs an estimation of its own prediction uncertainty. Experimental results on a public benchmark demonstrate that our approach performs on pair with a complex, dataset-specific baseline, while its uncertainty predictions are highly correlated to the actual angular error of corresponding estimations. Finally, experiments on images from a real assisted living environment demonstrate the higher suitability of our model for its final application.

ROMar 3, 2019

Detecting Invasive Insects with Unmanned Aerial Vehicles

Brian Stumph, Miguel Hernandez Virto, Henry Medeiros et al.

A key aspect to controlling and reducing the effects invasive insect species have on agriculture is to obtain knowledge about the migration patterns of these species. Current state-of-the-art methods of studying these migration patterns involve a mark-release-recapture technique, in which insects are released after being marked and researchers attempt to recapture them later. However, this approach involves a human researcher manually searching for these insects in large fields and results in very low recapture rates. In this paper, we propose an automated system for detecting released insects using an unmanned aerial vehicle. This system utilizes ultraviolet lighting technology, digital cameras, and lightweight computer vision algorithms to more quickly and accurately detect insects compared to the current state of the art. The efficiency and accuracy that this system provides will allow for a more comprehensive understanding of invasive insect species migration patterns. Our experimental results demonstrate that our system can detect real target insects in field conditions with high precision and recall rates.

CVSep 20, 2018

Multispecies fruit flower detection using a refined semantic segmentation network

Philipe A. Dias, Amy Tabb, Henry Medeiros

In fruit production, critical crop management decisions are guided by bloom intensity, i.e., the number of flowers present in an orchard. Despite its importance, bloom intensity is still typically estimated by means of human visual inspection. Existing automated computer vision systems for flower identification are based on hand-engineered techniques that work only under specific conditions and with limited performance. This work proposes an automated technique for flower identification that is robust to uncontrolled environments and applicable to different flower species. Our method relies on an end-to-end residual convolutional neural network (CNN) that represents the state-of-the-art in semantic segmentation. To enhance its sensitivity to flowers, we fine-tune this network using a single dataset of apple flower images. Since CNNs tend to produce coarse segmentations, we employ a refinement method to better distinguish between individual flower instances. Without any pre-processing or dataset-specific training, experimental results on images of apple, peach and pear flowers, acquired under different conditions demonstrate the robustness and broad applicability of our method.

CVSep 17, 2018

Apple Flower Detection using Deep Convolutional Networks

Philipe A. Dias, Amy Tabb, Henry Medeiros

To optimize fruit production, a portion of the flowers and fruitlets of apple trees must be removed early in the growing season. The proportion to be removed is determined by the bloom intensity, i.e., the number of flowers present in the orchard. Several automated computer vision systems have been proposed to estimate bloom intensity, but their overall performance is still far from satisfactory even in relatively controlled environments. With the goal of devising a technique for flower identification which is robust to clutter and to changes in illumination, this paper presents a method in which a pre-trained convolutional neural network is fine-tuned to become specially sensitive to flowers. Experimental results on a challenging dataset demonstrate that our method significantly outperforms three approaches that represent the state of the art in flower detection, with recall and precision rates higher than $90\%$. Moreover, a performance assessment on three additional datasets previously unseen by the network, which consist of different flower species and were acquired under different conditions, reveals that the proposed method highly surpasses baseline approaches in terms of generalization capability.

CVFeb 21, 2018

Semantic Segmentation Refinement by Monte Carlo Region Growing of High Confidence Detections

Philipe A. Dias, Henry Medeiros

Despite recent improvements using fully convolutional networks, in general, the segmentation produced by most state-of-the-art semantic segmentation methods does not show satisfactory adherence to the object boundaries. We propose a method to refine the segmentation results generated by such deep learning models. Our method takes as input the confidence scores generated by a pixel-dense segmentation network and re-labels pixels with low confidence levels. The re-labeling approach employs a region growing mechanism that aggregates these pixels to neighboring areas with high confidence scores and similar appearance. In order to correct the labels of pixels that were incorrectly classified with high confidence level by the semantic segmentation algorithm, we generate multiple region growing steps through a Monte Carlo sampling of the seeds of the regions. Our method improves the accuracy of a state-of-the-art fully convolutional semantic segmentation approach on the publicly available COCO and PASCAL datasets, and it shows significantly better results on selected sequences of the finely-annotated DAVIS dataset.

ROJul 17, 2017

A robotic vision system to measure tree traits

Amy Tabb, Henry Medeiros

The autonomous measurement of tree traits, such as branching structure, branch diameters, branch lengths, and branch angles, is required for tasks such as robotic pruning of trees as well as structural phenotyping. We propose a robotic vision system called the Robotic System for Tree Shape Estimation (RoTSE) to determine tree traits in field settings. The process is composed of the following stages: image acquisition with a mobile robot unit, segmentation, reconstruction, curve skeletonization, conversion to a graph representation, and then computation of traits. Quantitative and qualitative results on apple trees are shown in terms of accuracy, computation time, and robustness. Compared to ground truth measurements, the RoTSE produced the following estimates: branch diameter (root mean-squared error $2.97$ mm), branch length (root mean-squared error $136.92$ mm), and branch angle (mean-squared error $31.07$ degrees). The average run time was $8.47$ minutes when the voxel resolution was $3$ mm$^3$.

ROApr 21, 2017

Hierarchical Bayesian Data Fusion for Robotic Platform Navigation

Andres F. Echeverri, Henry Medeiros, Ryan Walsh et al.

Data fusion has become an active research topic in recent years. Growing computational performance has allowed the use of redundant sensors to measure a single phenomenon. While Bayesian fusion approaches are common in general applications, the computer vision field has largely relegated this approach. Most object following algorithms have gone towards pure machine learning fusion techniques that tend to lack flexibility. Consequently, a more general data fusion scheme is needed. Within this work, a hierarchical Bayesian fusion approach is proposed, which outperforms individual trackers by using redundant measurements. The adaptive framework is achieved by relying on each measurement's local statistics and a global softened majority voting. The proposed approach was validated in a simulated application and two robotic platforms.

CVFeb 24, 2017

Fast and robust curve skeletonization for real-world elongated objects

Amy Tabb, Henry Medeiros

We consider the problem of extracting curve skeletons of three-dimensional, elongated objects given a noisy surface, which has applications in agricultural contexts such as extracting the branching structure of plants. We describe an efficient and robust method based on breadth-first search that can determine curve skeletons in these contexts. Our approach is capable of automatically detecting junction points as well as spurious segments and loops. All of that is accomplished with only one user-adjustable parameter. The run time of our method ranges from hundreds of milliseconds to less than four seconds on large, challenging datasets, which makes it appropriate for situations where real-time decision making is needed. Experiments on synthetic models as well as on data from real world objects, some of which were collected in challenging field conditions, show that our approach compares favorably to classical thinning algorithms as well as to recent contributions to the field.

CVFeb 24, 2017

Automatic segmentation of trees in dynamic outdoor environments

Amy Tabb, Henry Medeiros

Segmentation in dynamic outdoor environments can be difficult when the illumination levels and other aspects of the scene cannot be controlled. Specifically in orchard and vineyard automation contexts, a background material is often used to shield a camera's field of view from other rows of crops. In this paper, we describe a method that uses superpixels to determine low texture regions of the image that correspond to the background material, and then show how this information can be integrated with the color distribution of the image to compute optimal segmentation parameters to segment objects of interest. Quantitative and qualitative experiments demonstrate the suitability of this approach for dynamic outdoor environments, specifically for tree reconstruction and apple flower detection applications.