Leslie M. Collins

h-index19

19papers

491citations

Novelty34%

AI Score32

Ranked #136,777 of 201,326 authors (top 68%)#42,576 in CV (top 72%)

19 Papers

CVApr 25, 2023

Segment anything, from space?

Simiao Ren, Francesco Luzi, Saad Lahrichi et al.

Recently, the first foundation model developed specifically for image segmentation tasks was developed, termed the "Segment Anything Model" (SAM). SAM can segment objects in input imagery based on cheap input prompts, such as one (or more) points, a bounding box, or a mask. The authors examined the \textit{zero-shot} image segmentation accuracy of SAM on a large number of vision benchmark tasks and found that SAM usually achieved recognition accuracy similar to, or sometimes exceeding, vision models that had been trained on the target tasks. The impressive generalization of SAM for segmentation has major implications for vision researchers working on natural imagery. In this work, we examine whether SAM's performance extends to overhead imagery problems and help guide the community's response to its development. We examine SAM's performance on a set of diverse and widely studied benchmark tasks. We find that SAM does often generalize well to overhead imagery, although it fails in some cases due to the unique characteristics of overhead imagery and its common target objects. We report on these unique systematic failure cases for remote sensing imagery that may comprise useful future research for the community.

LGNov 25, 2022

Mixture Manifold Networks: A Computationally Efficient Baseline for Inverse Modeling

Gregory P. Spell, Simiao Ren, Leslie M. Collins et al.

We propose and show the efficacy of a new method to address generic inverse problems. Inverse modeling is the task whereby one seeks to determine the control parameters of a natural system that produce a given set of observed measurements. Recent work has shown impressive results using deep learning, but we note that there is a trade-off between model performance and computational time. For some applications, the computational time at inference for the best performing inverse modeling method may be overly prohibitive to its use. We present a new method that leverages multiple manifolds as a mixture of backward (e.g., inverse) models in a forward-backward model architecture. These multiple backwards models all share a common forward model, and their training is mitigated by generating training examples from the forward model. The proposed method thus has two innovations: 1) the multiple Manifold Mixture Network (MMN) architecture, and 2) the training procedure involving augmenting backward model training data using the forward model. We demonstrate the advantages of our method by comparing to several baselines on four benchmark inverse problems, and we furthermore provide analysis to motivate its design.

CVSep 16, 2024

Are Deep Learning Models Robust to Partial Object Occlusion in Visual Recognition Tasks?

Kaleb Kassaw, Francesco Luzi, Leslie M. Collins et al.

Image classification models, including convolutional neural networks (CNNs), perform well on a variety of classification tasks but struggle under conditions of partial occlusion, i.e., conditions in which objects are partially covered from the view of a camera. Methods to improve performance under occlusion, including data augmentation, part-based clustering, and more inherently robust architectures, including Vision Transformer (ViT) models, have, to some extent, been evaluated on their ability to classify objects under partial occlusion. However, evaluations of these methods have largely relied on images containing artificial occlusion, which are typically computer-generated and therefore inexpensive to label. Additionally, methods are rarely compared against each other, and many methods are compared against early, now outdated, deep learning models. We contribute the Image Recognition Under Occlusion (IRUO) dataset, based on the recently developed Occluded Video Instance Segmentation (OVIS) dataset (arXiv:2102.01558). IRUO utilizes real-world and artificially occluded images to test and benchmark leading methods' robustness to partial occlusion in visual recognition tasks. In addition, we contribute the design and results of a human study using images from IRUO that evaluates human classification performance at multiple levels and types of occlusion. We find that modern CNN-based models show improved recognition accuracy on occluded images compared to earlier CNN-based models, and ViT-based models are more accurate than CNN-based models on occluded images, performing only modestly worse than human accuracy. We also find that certain types of occlusion, including diffuse occlusion, where relevant objects are seen through "holes" in occluders such as fences and leaves, can greatly reduce the accuracy of deep recognition models as compared to humans, especially those with CNN backbones.

CVDec 24, 2022

Meta-Learning for Color-to-Infrared Cross-Modal Style Transfer

Evelyn A. Stump, Francesco Luzi, Leslie M. Collins et al.

Recent object detection models for infrared (IR) imagery are based upon deep neural networks (DNNs) and require large amounts of labeled training imagery. However, publicly available datasets that can be used for such training are limited in their size and diversity. To address this problem, we explore cross-modal style transfer (CMST) to leverage large and diverse color imagery datasets so that they can be used to train DNN-based IR image-based object detectors. We evaluate six contemporary stylization methods on four publicly-available IR datasets - the first comparison of its kind - and find that CMST is highly effective for DNN-based detectors. Surprisingly, we find that existing data-driven methods are outperformed by a simple grayscale stylization (an average of the color channels). Our analysis reveals that existing data-driven methods are either too simplistic or introduce significant artifacts into the imagery. To overcome these limitations, we propose meta-learning style transfer (MLST), which learns a stylization by composing and tuning well-behaved analytic functions. We find that MLST leads to more complex stylizations without introducing significant image artifacts and achieves the best overall detector performance on our benchmark datasets.

CVSep 19, 2022

Meta-simulation for the Automated Design of Synthetic Overhead Imagery

Handi Yu, Simiao Ren, Leslie M. Collins et al.

The use of synthetic (or simulated) data for training machine learning models has grown rapidly in recent years. Synthetic data can often be generated much faster and more cheaply than its real-world counterpart. One challenge of using synthetic imagery however is scene design: e.g., the choice of content and its features and spatial arrangement. To be effective, this design must not only be realistic, but appropriate for the target domain, which (by assumption) is unlabeled. In this work, we propose an approach to automatically choose the design of synthetic imagery based upon unlabeled real-world imagery. Our approach, termed Neural-Adjoint Meta-Simulation (NAMS), builds upon the seminal recent meta-simulation approaches. In contrast to the current state-of-the-art methods, our approach can be pre-trained once offline, and then provides fast design inference for new target imagery. Using both synthetic and real-world problems, we show that NAMS infers synthetic designs that match both the in-domain and out-of-domain target imagery, and that training segmentation models with NAMS-designed imagery yields superior results compared to naïve randomized designs and state-of-the-art meta-simulation methods.

LGJun 23, 2025

Ground tracking for improved landmine detection in a GPR system

Li Tang, Peter A. Torrione, Cihat Eldeniz et al.

Ground penetrating radar (GPR) provides a promising technology for accurate subsurface object detection. In particular, it has shown promise for detecting landmines with low metal content. However, the ground bounce (GB) that is present in GPR data, which is caused by the dielectric discontinuity between soil and air, is a major source of interference and degrades landmine detection performance. To mitigate this interference, GB tracking algorithms formulated using both a Kalman filter (KF) and a particle filter (PF) framework are proposed. In particular, the location of the GB in the radar signal is modeled as the hidden state in a stochastic system for the PF approach. The observations are the 2D radar images, which arrive scan by scan along the down-track direction. An initial training stage sets parameters automatically to accommodate different ground and weather conditions. The features associated with the GB description are updated adaptively with the arrival of new data. The prior distribution for a given location is predicted by propagating information from two adjacent channels/scans, which ensures that the overall GB surface remains smooth. The proposed algorithms are verified in experiments utilizing real data, and their performances are compared with other GB tracking approaches. We demonstrate that improved GB tracking contributes to improved performance for the landmine detection problem.

SDAug 12, 2021

Parameter Tuning of Time-Frequency Masking Algorithms for Reverberant Artifact Removal within the Cochlear Implant Stimulus

Lidea K. Shahidi, Leslie M. Collins, Boyla O. Mainsah

Cochlear implant users struggle to understand speech in reverberant environments. To restore speech perception, artifacts dominated by reverberant reflections can be removed from the cochlear implant stimulus. Artifacts can be identified and removed by applying a matrix of gain values, a technique referred to as time-frequency masking. Gain values are determined by an oracle algorithm that uses knowledge of the undistorted signal to minimize retention of the signal components dominated by reverberant reflections. In practice, gain values are estimated from the distorted signal, with the oracle algorithm providing the estimation objective. Different oracle techniques exist for determining gain values, and each technique must be parameterized to set the amount of signal retention. This work assesses which oracle masking strategies and parameterizations lead to the best improvements in speech intelligibility for cochlear implant users in reverberant conditions using online speech intelligibility testing of normal-hearing individuals with vocoding.

ASMay 28, 2021

Phoneme-Based Ratio Mask Estimation for Reverberant Speech Enhancement in Cochlear Implant Processors

Kevin M. Chu, Leslie M. Collins, Boyla O. Mainsah

Cochlear implant (CI) users have considerable difficulty in understanding speech in reverberant listening environments. Time-frequency (T-F) masking is a common technique that aims to improve speech intelligibility by multiplying reverberant speech by a matrix of gain values to suppress T-F bins dominated by reverberation. Recently proposed mask estimation algorithms leverage machine learning approaches to distinguish between target speech and reverberant reflections. However, the spectro-temporal structure of speech is highly variable and dependent on the underlying phoneme. One way to potentially overcome this variability is to leverage explicit knowledge of phonemic information during mask estimation. This study proposes a phoneme-based mask estimation algorithm, where separate mask estimation models are trained for each phoneme. Sentence recognition tests were conducted in normal hearing listeners to determine whether a phoneme-based mask estimation algorithm is beneficial in the ideal scenario where perfect knowledge of the phoneme is available. The results showed that the phoneme-based masks improved the intelligibility of vocoded speech when compared to conventional phoneme-independent masks. The results suggest that a phoneme-based speech enhancement strategy may potentially benefit CI users in reverberant listening environments.

ASMay 28, 2021

Assessing the intelligibility of vocoded speech using a remote testing framework

Kevin M. Chu, Leslie M. Collins, Boyla O. Mainsah

Over the past year, remote speech intelligibility testing has become a popular and necessary alternative to traditional in-person experiments due to the need for physical distancing during the COVID-19 pandemic. A remote framework was developed for conducting speech intelligibility tests with normal hearing listeners. In this study, subjects used their personal computers to complete sentence recognition tasks in anechoic and reverberant listening environments. The results obtained using this remote framework were compared with previously collected in-lab results, and showed higher levels of speech intelligibility among remote study participants than subjects who completed the test in the laboratory.

LGMay 6, 2021

Evaluating the Effect of Longitudinal Dose and INR Data on Maintenance Warfarin Dose Predictions

Anish Karpurapu, Adam Krekorian, Ye Tian et al.

Warfarin, a commonly prescribed drug to prevent blood clots, has a highly variable individual response. Determining a maintenance warfarin dose that achieves a therapeutic blood clotting time, as measured by the international normalized ratio (INR), is crucial in preventing complications. Machine learning algorithms are increasingly being used for warfarin dosing; usually, an initial dose is predicted with clinical and genotype factors, and this dose is revised after a few days based on previous doses and current INR. Since a sequence of prior doses and INR better capture the variability in individual warfarin response, we hypothesized that longitudinal dose response data will improve maintenance dose predictions. To test this hypothesis, we analyzed a dataset from the COAG warfarin dosing study, which includes clinical data, warfarin doses and INR measurements over the study period, and maintenance dose when therapeutic INR was achieved. Various machine learning regression models to predict maintenance warfarin dose were trained with clinical factors and dosing history and INR data as features. Overall, dose revision algorithms with a single dose and INR achieved comparable performance as the baseline dose revision algorithm. In contrast, dose revision algorithms with longitudinal dose and INR data provided maintenance dose predictions that were statistically significantly much closer to the true maintenance dose. Focusing on the best performing model, gradient boosting (GB), the proportion of ideal estimated dose, i.e., defined as within $\pm$20% of the true dose, increased from the baseline (54.92%) to the GB model with the single (63.11%) and longitudinal (75.41%) INR. More accurate maintenance dose predictions with longitudinal dose response data can potentially achieve therapeutic INR faster, reduce drug-related complications and improve patient outcomes with warfarin therapy.

CVJan 16, 2021

GridTracer: Automatic Mapping of Power Grids using Deep Learning and Overhead Imagery

Bohao Huang, Jichen Yang, Artem Streltsov et al.

Energy system information valuable for electricity access planning such as the locations and connectivity of electricity transmission and distribution towers, termed the power grid, is often incomplete, outdated, or altogether unavailable. Furthermore, conventional means for collecting this information is costly and limited. We propose to automatically map the grid in overhead remotely sensed imagery using deep learning. Towards this goal, we develop and publicly-release a large dataset ($263km^2$) of overhead imagery with ground truth for the power grid, to our knowledge this is the first dataset of its kind in the public domain. Additionally, we propose scoring metrics and baseline algorithms for two grid mapping tasks: (1) tower recognition and (2) power line interconnection (i.e., estimating a graph representation of the grid). We hope the availability of the training data, scoring metrics, and baselines will facilitate rapid progress on this important problem to help decision-makers address the energy needs of societies around the world.

CVJun 4, 2018

gprHOG and the popularity of Histogram of Oriented Gradients (HOG) for Buried Threat Detection in Ground-Penetrating Radar

Daniel Reichman, Leslie M. Collins, Jordan M. Malof

Substantial research has been devoted to the development of algorithms that automate buried threat detection (BTD) with ground penetrating radar (GPR) data, resulting in a large number of proposed algorithms. One popular algorithm GPR-based BTD, originally applied by Torrione et al., 2012, is the Histogram of Oriented Gradients (HOG) feature. In a recent large-scale comparison among five veteran institutions, a modified version of HOG referred to here as "gprHOG", performed poorly compared to other modern algorithms. In this paper, we provide experimental evidence demonstrating that the modifications to HOG that comprise gprHOG result in a substantially better-performing algorithm. The results here, in conjunction with the large-scale algorithm comparison, suggest that HOG is not competitive with modern GPR-based BTD algorithms. Given HOG's popularity, these results raise some questions about many existing studies, and suggest gprHOG (and especially HOG) should be employed with caution in future studies.

CVMay 30, 2018

Tiling and Stitching Segmentation Output for Remote Sensing: Basic Challenges and Recommendations

Bohao Huang, Daniel Reichman, Leslie M. Collins et al.

In this work we consider the application of convolutional neural networks (CNNs) for pixel-wise labeling (a.k.a., semantic segmentation) of remote sensing imagery (e.g., aerial color or hyperspectral imagery). Remote sensing imagery is usually stored in the form of very large images, referred to as "tiles", which are too large to be segmented directly using most CNNs and their associated hardware. As a result, during label inference, smaller sub-images, called "patches", are processed individually and then "stitched" (concatenated) back together to create a tile-sized label map. This approach suffers from computational ineffiency and can result in discontinuities at output boundaries. We propose a simple alternative approach in which the input size of the CNN is dramatically increased only during label inference. This does not avoid stitching altogether, but substantially mitigates its limitations. We evaluate the performance of the proposed approach against a vonventional stitching approach using two popular segmentation CNN models and two large-scale remote sensing imagery datasets. The results suggest that the proposed approach substantially reduces label inference time, while also yielding modest overall label accuracy increases. This approach contributed to our wining entry (overall performance) in the INRIA building labeling competition.

CVMar 10, 2018

A Large-Scale Multi-Institutional Evaluation of Advanced Discrimination Algorithms for Buried Threat Detection in Ground Penetrating Radar

Jordan M. Malof, Daniel Reichman, Andrew Karem et al.

In this paper we consider the development of algorithms for the automatic detection of buried threats using ground penetrating radar (GPR) measurements. GPR is one of the most studied and successful modalities for automatic buried threat detection (BTD), and a large variety of BTD algorithms have been proposed for it. Despite this, large-scale comparisons of GPR-based BTD algorithms are rare in the literature. In this work we report the results of a multi-institutional effort to develop advanced buried threat detection algorithms for a real-world GPR BTD system. The effort involved five institutions with substantial experience with the development of GPR-based BTD algorithms. In this paper we report the technical details of the advanced algorithms submitted by each institution, representing their latest technical advances, and many state-of-the-art GPR-based BTD algorithms. We also report the results of evaluating the algorithms from each institution on the large experimental dataset used for development. The experimental dataset comprised 120,000 m^2 of GPR data using surface area, from 13 different lanes across two US test sites. The data was collected using a vehicle-mounted GPR system, the variants of which have supplied data for numerous publications. Using these results, we identify the most successful and common processing strategies among the submitted algorithms, and make recommendations for GPR-based BTD algorithm design.

CVJan 11, 2018

Application of a semantic segmentation convolutional neural network for accurate automatic detection and mapping of solar photovoltaic arrays in aerial imagery

Joseph Camilo, Rui Wang, Leslie M. Collins et al.

We consider the problem of automatically detecting small-scale solar photovoltaic arrays for behind-the-meter energy resource assessment in high resolution aerial imagery. Such algorithms offer a faster and more cost-effective solution to collecting information on distributed solar photovoltaic (PV) arrays, such as their location, capacity, and generated energy. The surface area of PV arrays, a characteristic which can be estimated from aerial imagery, provides an important proxy for array capacity and energy generation. In this work, we employ a state-of-the-art convolutional neural network architecture, called SegNet (Badrinarayanan et. al., 2015), to semantically segment (or map) PV arrays in aerial imagery. This builds on previous work focused on identifying the locations of PV arrays, as opposed to their specific shapes and sizes. We measure the ability of our SegNet implementation to estimate the surface area of PV arrays on a large, publicly available, dataset that has been employed in several previous studies. The results indicate that the SegNet model yields substantial performance improvements with respect to estimating shape and size as compared to a recently proposed convolutional neural network PV detection algorithm.

HCJul 3, 2017

Adaptive Stimulus Selection in ERP-Based Brain-Computer Interfaces by Maximizing Expected Discrimination Gain

Dmitry Kalika, Leslie M. Collins, Chandra S. Throckmorton et al.

Brain-computer interfaces (BCIs) can provide an alternative means of communication for individuals with severe neuromuscular limitations. The P300-based BCI speller relies on eliciting and detecting transient event-related potentials (ERPs) in electroencephalography (EEG) data, in response to a user attending to rarely occurring target stimuli amongst a series of non-target stimuli. However, in most P300 speller implementations, the stimuli to be presented are randomly selected from a limited set of options and stimulus selection and presentation are not optimized based on previous user data. In this work, we propose a data-driven method for stimulus selection based on the expected discrimination gain metric. The data-driven approach selects stimuli based on previously observed stimulus responses, with the aim of choosing a set of stimuli that will provide the most information about the user's intended target character. Our approach incorporates knowledge of physiological and system constraints imposed due to real-time BCI implementation. Simulations were performed to compare our stimulus selection approach to the row-column paradigm, the conventional stimulus selection method for P300 spellers. Results from the simulations demonstrated that our adaptive stimulus selection approach has the potential to significantly improve performance from the conventional method: up to 34% improvement in accuracy and 43% reduction in the mean number of stimulus presentations required to spell a character in a 72-character grid. In addition, our greedy approach to stimulus selection provides the flexibility to accommodate design constraints.

CVFeb 9, 2017

A large comparison of feature-based approaches for buried target classification in forward-looking ground-penetrating radar

Joseph A. Camilo, Leslie M. Collins, Jordan M. Malof

Forward-looking ground-penetrating radar (FLGPR) has recently been investigated as a remote sensing modality for buried target detection (e.g., landmines). In this context, raw FLGPR data is beamformed into images and then computerized algorithms are applied to automatically detect subsurface buried targets. Most existing algorithms are supervised, meaning they are trained to discriminate between labeled target and non-target imagery, usually based on features extracted from the imagery. A large number of features have been proposed for this purpose, however thus far it is unclear which are the most effective. The first goal of this work is to provide a comprehensive comparison of detection performance using existing features on a large collection of FLGPR data. Fusion of the decisions resulting from processing each feature is also considered. The second goal of this work is to investigate two modern feature learning approaches from the object recognition literature: the bag-of-visual-words and the Fisher vector for FLGPR processing. The results indicate that the new feature learning approaches outperform existing methods. Results also show that fusion between existing features and new features yields little additional performance improvements.

CVDec 11, 2016

On Choosing Training and Testing Data for Supervised Algorithms in Ground Penetrating Radar Data for Buried Threat Detection

Daniël Reichman, Leslie M. Collins, Jordan M. Malof

Ground penetrating radar (GPR) is one of the most popular and successful sensing modalities that has been investigated for landmine and subsurface threat detection. Many of the detection algorithms applied to this task are supervised and therefore require labeled examples of target and non-target data for training. Training data most often consists of 2-dimensional images (or patches) of GPR data, from which features are extracted, and provided to the classifier during training and testing. Identifying desirable training and testing locations to extract patches, which we term "keypoints", is well established in the literature. In contrast however, a large variety of strategies have been proposed regarding keypoint utilization (e.g., how many of the identified keypoints should be used at targets, or non-target, locations). Given the variety keypoint utilization strategies that are available, it is very unclear (i) which strategies are best, or (ii) whether the choice of strategy has a large impact on classifier performance. We address these questions by presenting a taxonomy of existing utilization strategies, and then evaluating their effectiveness on a large dataset using many different classifiers and features. We analyze the results and propose a new strategy, called PatchSelect, which outperforms other strategies across all experiments.

CVJul 20, 2016

Automatic Detection of Solar Photovoltaic Arrays in High Resolution Aerial Imagery

Jordan M. Malof, Kyle Bradbury, Leslie M. Collins et al.

The quantity of small scale solar photovoltaic (PV) arrays in the United States has grown rapidly in recent years. As a result, there is substantial interest in high quality information about the quantity, power capacity, and energy generated by such arrays, including at a high spatial resolution (e.g., counties, cities, or even smaller regions). Unfortunately, existing methods for obtaining this information, such as surveys and utility interconnection filings, are limited in their completeness and spatial resolution. This work presents a computer algorithm that automatically detects PV panels using very high resolution color satellite imagery. The approach potentially offers a fast, scalable method for obtaining accurate information on PV array location and size, and at much higher spatial resolutions than are currently available. The method is validated using a very large (135 km^2) collection of publicly available [1] aerial imagery, with over 2,700 human annotated PV array locations. The results demonstrate the algorithm is highly effective on a per-pixel basis. It is likewise effective at object-level PV array detection, but with significant potential for improvement in estimating the precise shape/size of the PV arrays. These results are the first of their kind for the detection of solar PV in aerial imagery, demonstrating the feasibility of the approach and establishing a baseline performance for future investigations.