Ian Stavness

h-index30

21papers

2,601citations

Novelty40%

AI Score47

Ranked #34,857 of 194,257 authors (top 18%)#12,398 in CV (top 21%)

21 Papers

5.0CVJul 17, 2023Code

DARTS: Double Attention Reference-based Transformer for Super-resolution

Masoomeh Aslahishahri, Jordan Ubbens, Ian Stavness

We present DARTS, a transformer model for reference-based image super-resolution. DARTS learns joint representations of two image distributions to enhance the content of low-resolution input images through matching correspondences learned from high-resolution reference images. Current state-of-the-art techniques in reference-based image super-resolution are based on a multi-network, multi-stage architecture. In this work, we adapt the double attention block from the GAN literature, processing the two visual streams separately and combining self-attention and cross-attention blocks through a gating attention strategy. Our work demonstrates how the attention mechanism can be adapted for the particular requirements of reference-based image super-resolution, significantly simplifying the architecture and training pipeline. We show that our transformer-based model performs competitively with state-of-the-art models, while maintaining a simpler overall architecture and training process. In particular, we obtain state-of-the-art on the SUN80 dataset, with a PSNR/SSIM of 29.83 / .809. These results show that attention alone is sufficient for the RSR task, without multiple purpose-built subnetworks, knowledge distillation, or multi-stage training.

5.2CVAug 30, 2024Code

HiTSR: A Hierarchical Transformer for Reference-based Super-Resolution

Masoomeh Aslahishahri, Jordan Ubbens, Ian Stavness

In this paper, we propose HiTSR, a hierarchical transformer model for reference-based image super-resolution, which enhances low-resolution input images by learning matching correspondences from high-resolution reference images. Diverging from existing multi-network, multi-stage approaches, we streamline the architecture and training pipeline by incorporating the double attention block from GAN literature. Processing two visual streams independently, we fuse self-attention and cross-attention blocks through a gating attention strategy. The model integrates a squeeze-and-excitation module to capture global context from the input images, facilitating long-range spatial interactions within window-based attention blocks. Long skip connections between shallow and deep layers further enhance information flow. Our model demonstrates superior performance across three datasets including SUN80, Urban100, and Manga109. Specifically, on the SUN80 dataset, our model achieves PSNR/SSIM values of 30.24/0.821. These results underscore the effectiveness of attention mechanisms in reference-based image super-resolution. The transformer-based model attains state-of-the-art results without the need for purpose-built subnetworks, knowledge distillation, or multi-stage training, emphasizing the potency of attention in meeting reference-based image super-resolution requirements.

5.0CVApr 10

Learnable Motion-Focused Tokenization for Effective and Efficient Video Unsupervised Domain Adaptation

Tzu Ling Liu, Ian Stavness, Mrigank Rochan

Video Unsupervised Domain Adaptation (VUDA) poses a significant challenge in action recognition, requiring the adaptation of a model from a labeled source domain to an unlabeled target domain. Despite recent advances, existing VUDA methods often fall short of fully supervised performance, a key reason being the prevalence of static and uninformative backgrounds that exacerbate domain shifts. Additionally, prior approaches largely overlook computational efficiency, limiting real-world adoption. To address these issues, we propose Learnable Motion-Focused Tokenization (LMFT) for VUDA. LMFT tokenizes video frames into patch tokens and learns to discard low-motion, redundant tokens, primarily corresponding to background regions, while retaining motion-rich, action-relevant tokens for adaptation. Extensive experiments on three standard VUDA benchmarks across 21 domain adaptation settings show that our VUDA framework with LMFT achieves state-of-the-art performance while significantly reducing computational overhead. LMFT thus enables VUDA that is both effective and computationally efficient.

28.8LGDec 9, 2021Code

Extending the WILDS Benchmark for Unsupervised Adaptation

Shiori Sagawa, Pang Wei Koh, Tony Lee et al.

Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well. However, existing distribution shift benchmarks with unlabeled data do not reflect the breadth of scenarios that arise in real-world applications. In this work, we present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). The update maintains consistency with the original WILDS benchmark by using identical labeled training, validation, and test sets, as well as the evaluation metrics. On these datasets, we systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on WILDS is limited. To facilitate method development and evaluation, we provide an open-source package that automates data loading and contains all of the model architectures and methods used in this paper. Code and leaderboards are available at https://wilds.stanford.edu.

53.0LGDec 14, 2020Code

WILDS: A Benchmark of in-the-Wild Distribution Shifts

Pang Wei Koh, Shiori Sagawa, Henrik Marklund et al.

Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated benchmark of 10 datasets reflecting a diverse range of distribution shifts that naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. On each dataset, we show that standard training yields substantially lower out-of-distribution than in-distribution performance. This gap remains even with models trained by existing methods for tackling distribution shifts, underscoring the need for new methods for training models that are more robust to the types of distribution shifts that arise in practice. To facilitate method development, we provide an open-source package that automates dataset loading, contains default model architectures and hyperparameters, and standardizes evaluations. Code and leaderboards are available at https://wilds.stanford.edu.

4.1LGApr 17, 2019Code

Sparseout: Controlling Sparsity in Deep Networks

Najeeb Khan, Ian Stavness

Dropout is commonly used to help reduce overfitting in deep neural networks. Sparsity is a potentially important property of neural networks, but is not explicitly controlled by Dropout-based regularization. In this work, we propose Sparseout a simple and efficient variant of Dropout that can be used to control the sparsity of the activations in a neural network. We theoretically prove that Sparseout is equivalent to an $L_q$ penalty on the features of a generalized linear model and that Dropout is a special case of Sparseout for neural networks. We empirically demonstrate that Sparseout is computationally inexpensive and is able to control the desired level of sparsity in the activations. We evaluated Sparseout on image classification and language modelling tasks to see the effect of sparsity on these tasks. We found that sparsity of the activations is favorable for language modelling performance while image classification benefits from denser activations. Sparseout provides a way to investigate sparsity in state-of-the-art deep learning models. Source code for Sparseout could be found at \url{https://github.com/najeebkhan/sparseout}.

2.0CVJun 7, 2024

A Semi-Self-Supervised Approach for Dense-Pattern Video Object Segmentation

Keyhan Najafian, Farhad Maleki, Lingling Jin et al.

Video object segmentation (VOS) -- predicting pixel-level regions for objects within each frame of a video -- is particularly challenging in agricultural scenarios, where videos of crops include hundreds of small, dense, and occluded objects (stems, leaves, flowers, pods) that sway and move unpredictably in the wind. Supervised training is the state-of-the-art for VOS, but it requires large, pixel-accurate, human-annotated videos, which are costly to produce for videos with many densely packed objects in each frame. To address these challenges, we proposed a semi-self-supervised spatiotemporal approach for dense-VOS (DVOS) using a diffusion-based method through multi-task (reconstruction and segmentation) learning. We train the model first with synthetic data that mimics the camera and object motion of real videos and then with pseudo-labeled videos. We evaluate our DVOS method for wheat head segmentation from a diverse set of videos (handheld, drone-captured, different field locations, and different growth stages -- spanning from Boot-stage to Wheat-mature and Harvest-ready). Despite using only a few manually annotated video frames, the proposed approach yielded a high-performing model, achieving a Dice score of 0.79 when tested on a drone-captured external test set. While our method was evaluated on wheat head segmentation, it can be extended to other crops and domains, such as crowd analysis or microscopic image analysis.

5.6CVMay 17, 2021

Global Wheat Head Dataset 2021: more diversity to improve the benchmarking of wheat head localization methods

Etienne David, Mario Serouart, Daniel Smith et al.

The Global Wheat Head Detection (GWHD) dataset was created in 2020 and has assembled 193,634 labelled wheat heads from 4,700 RGB images acquired from various acquisition platforms and 7 countries/institutions. With an associated competition hosted in Kaggle, GWHD has successfully attracted attention from both the computer vision and agricultural science communities. From this first experience in 2020, a few avenues for improvements have been identified, especially from the perspective of data size, head diversity and label reliability. To address these issues, the 2020 dataset has been reexamined, relabeled, and augmented by adding 1,722 images from 5 additional countries, allowing for 81,553 additional wheat heads to be added. We now release a new version of the Global Wheat Head Detection (GWHD) dataset in 2021, which is bigger, more diverse, and less noisy than the 2020 version. The GWHD 2021 is now publicly available at http://www.global-wheat.com/ and a new data challenge has been organized on AIcrowd to make use of this updated dataset.

2.6CVMay 13, 2021

Global Wheat Challenge 2020: Analysis of the competition design and winning models

Etienne David, Franklin Ogidi, Wei Guo et al.

Data competitions have become a popular approach to crowdsource new data analysis methods for general and specialized data science problems. In plant phenotyping, data competitions have a rich history, and new outdoor field datasets have potential for new data competitions. We developed the Global Wheat Challenge as a generalization competition to see if solutions for wheat head detection from field images would work in different regions around the world. In this paper, we analyze the winning challenge solutions in terms of their robustness and the relative importance of model and data augmentation design decisions. We found that the design of the competition influence the selection of winning solutions and provide recommendations for future competitions in an attempt to garner more robust winning solutions.

1.2LGSep 23, 2020

Pruning Convolutional Filters using Batch Bridgeout

Najeeb Khan, Ian Stavness

State-of-the-art computer vision models are rapidly increasing in capacity, where the number of parameters far exceeds the number required to fit the training set. This results in better optimization and generalization performance. However, the huge size of contemporary models results in large inference costs and limits their use on resource-limited devices. In order to reduce inference costs, convolutional filters in trained neural networks could be pruned to reduce the run-time memory and computational requirements during inference. However, severe post-training pruning results in degraded performance if the training algorithm results in dense weight vectors. We propose the use of Batch Bridgeout, a sparsity inducing stochastic regularization scheme, to train neural networks so that they could be pruned efficiently with minimal degradation in performance. We evaluate the proposed method on common computer vision models VGGNet, ResNet, and Wide-ResNet on the CIFAR image classification task. For all the networks, experimental results show that Batch Bridgeout trained networks achieve higher accuracy across a wide range of pruning intensities compared to Dropout and weight decay regularization.

8.5CVSep 2, 2020Code

Unsupervised Domain Adaptation For Plant Organ Counting

Tewodros Ayalew, Jordan Ubbens, Ian Stavness

Supervised learning is often used to count objects in images, but for counting small, densely located objects, the required image annotations are burdensome to collect. Counting plant organs for image-based plant phenotyping falls within this category. Object counting in plant images is further challenged by having plant image datasets with significant domain shift due to different experimental conditions, e.g. applying an annotated dataset of indoor plant images for use on outdoor images, or on a different plant species. In this paper, we propose a domain-adversarial learning approach for domain adaptation of density map estimation for the purposes of object counting. The approach does not assume perfectly aligned distributions between the source and target datasets, which makes it more broadly applicable within general object counting and plant organ counting tasks. Evaluation on two diverse object counting tasks (wheat spikelets, leaves) demonstrates consistent performance on the target datasets across different classes of domain shift: from indoor-to-outdoor images and from species-to-species adaptation.

6.5CVJul 17, 2020Code

AutoCount: Unsupervised Segmentation and Counting of Organs in Field Images

Jordan Ubbens, Tewodros Ayalew, Steve Shirtliffe et al.

Counting plant organs such as heads or tassels from outdoor imagery is a popular benchmark computer vision task in plant phenotyping, which has been previously investigated in the literature using state-of-the-art supervised deep learning techniques. However, the annotation of organs in field images is time-consuming and prone to errors. In this paper, we propose a fully unsupervised technique for counting dense objects such as plant organs. We use a convolutional network-based unsupervised segmentation method followed by two post-hoc optimization steps. The proposed technique is shown to provide competitive counting performance on a range of organ counting tasks in sorghum (S. bicolor) and wheat (T. aestivum) with no dataset-dependent tuning or modifications.

13.2CVApr 25, 2020

Global Wheat Head Detection (GWHD) dataset: a large and diverse dataset of high resolution RGB labelled images to develop and benchmark wheat head detection methods

E. David, S. Madec, P. Sadeghi-Tehran et al.

Detection of wheat heads is an important task allowing to estimate pertinent traits including head population density and head characteristics such as sanitary state, size, maturity stage and the presence of awns. Several studies developed methods for wheat head detection from high-resolution RGB imagery. They are based on computer vision and machine learning and are generally calibrated and validated on limited datasets. However, variability in observational conditions, genotypic differences, development stages, head orientation represents a challenge in computer vision. Further, possible blurring due to motion or wind and overlap between heads for dense populations make this task even more complex. Through a joint international collaborative effort, we have built a large, diverse and well-labelled dataset, the Global Wheat Head detection (GWHD) dataset. It contains 4,700 high-resolution RGB images and 190,000 labelled wheat heads collected from several countries around the world at different growth stages with a wide range of genotypes. Guidelines for image acquisition, associating minimum metadata to respect FAIR principles and consistent head labelling methods are proposed when developing new head detection datasets. The GWHD is publicly available at http://www.global-wheat.com/ and aimed at developing and benchmarking methods for wheat head detection.

5.8CVJan 9, 2020

Multi-Scale Weight Sharing Network for Image Recognition

Shubhra Aich, Ian Stavness, Yasuhiro Taniguchi et al.

In this paper, we explore the idea of weight sharing over multiple scales in convolutional networks. Inspired by traditional computer vision approaches, we share the weights of convolution kernels over different scales in the same layers of the network. Although multi-scale feature aggregation and sharing inside convolutional networks are common in practice, none of the previous works address the issue of convolutional weight sharing. We evaluate our weight sharing scheme on two heterogeneous image recognition datasets - ImageNet (object recognition) and Places365-Standard (scene classification). With approximately 25% fewer parameters, our shared-weight ResNet model provides similar performance compared to baseline ResNets. Shared-weight models are further validated via transfer learning experiments on four additional image recognition datasets - Caltech256 and Stanford 40 Actions (object-centric) and SUN397 and MIT Inddor67 (scene-centric). Experimental results demonstrate significant redundancy in the vanilla implementations of the deeper networks, and also indicate that a shift towards increasing the receptive field per parameter may improve future convolutional network architectures.

9.6CVMay 28, 2018

Global Sum Pooling: A Generalization Trick for Object Counting with Small Datasets of Large Images

Shubhra Aich, Ian Stavness

In this paper, we explore the problem of training one-look regression models for counting objects in datasets comprising a small number of high-resolution, variable-shaped images. We illustrate that conventional global average pooling (GAP) based models are unreliable due to the patchwise cancellation of true overestimates and underestimates for patchwise inference. To overcome this limitation and reduce overfitting caused by the training on full-resolution images, we propose to employ global sum pooling (GSP) instead of GAP or fully connected (FC) layers at the backend of a convolutional network. Although computationally equivalent to GAP, we show through comprehensive experimentation that GSP allows convolutional networks to learn the counting task as a simple linear mapping problem generalized over the input shape and the number of objects present. This generalization capability allows GSP to avoid both patchwise cancellation and overfitting by training on small patches and inference on full-resolution images as a whole. We evaluate our approach on four different aerial image datasets - two car counting datasets (CARPK and COWC), one crowd counting dataset (ShanghaiTech; parts A and B) and one new challenging dataset for wheat spike counting. Our GSP models improve upon the state-of-the-art approaches on all four datasets with a simple architecture. Also, GSP architectures trained with smaller-sized image patches exhibit better localization property due to their focus on learning from smaller regions while training.

7.3CVMay 1, 2018Code

Semantic Binary Segmentation using Convolutional Networks without Decoders

Shubhra Aich, William van der Kamp, Ian Stavness

In this paper, we propose an efficient architecture for semantic image segmentation using the depth-to-space (D2S) operation. Our D2S model is comprised of a standard CNN encoder followed by a depth-to-space reordering of the final convolutional feature maps. Our approach eliminates the decoder portion of traditional encoder-decoder segmentation models and reduces the amount of computation almost by half. As a participant of the DeepGlobe Road Extraction competition, we evaluate our models on the corresponding road segmentation dataset. Our highly efficient D2S models exhibit comparable performance to standard segmentation models with much lower computational cost.

12.8NEApr 21, 2018

Bridgeout: stochastic bridge regularization for deep neural networks

Najeeb Khan, Jawad Shah, Ian Stavness

A major challenge in training deep neural networks is overfitting, i.e. inferior performance on unseen test examples compared to performance on training examples. To reduce overfitting, stochastic regularization methods have shown superior performance compared to deterministic weight penalties on a number of image recognition tasks. Stochastic methods such as Dropout and Shakeout, in expectation, are equivalent to imposing a ridge and elastic-net penalty on the model parameters, respectively. However, the choice of the norm of weight penalty is problem dependent and is not restricted to $\{L_1,L_2\}$. Therefore, in this paper we propose the Bridgeout stochastic regularization technique and prove that it is equivalent to an $L_q$ penalty on the weights, where the norm $q$ can be learned as a hyperparameter from data. Experimental results show that Bridgeout results in sparse model weights, improved gradients and superior classification performance compared to Dropout and Shakeout on synthetic and real datasets.

3.9CVMar 14, 2018Code

Improving Object Counting with Heatmap Regulation

Shubhra Aich, Ian Stavness

In this paper, we propose a simple and effective way to improve one-look regression models for object counting from images. We use class activation map visualizations to illustrate the drawbacks of learning a pure one-look regression model for a counting task. Based on these insights, we enhance one-look regression counting models by regulating activation maps from the final convolution layer of the network with coarse ground-truth activation maps generated from simple dot annotations. We call this strategy heatmap regulation (HR). We show that this simple enhancement effectively suppresses false detections generated by the corresponding one-look baseline model and also improves the performance in terms of false negatives. Evaluations are performed on four different counting datasets --- two for car counting (CARPK, PUCPR+), one for crowd counting (WorldExpo) and another for biological cell counting (VGG-Cells). Adding HR to a simple VGG front-end improves performance on all these benchmarks compared to a simple one-look baseline model and results in state-of-the-art performance for car counting.

5.0CVSep 30, 2017Code

DeepWheat: Estimating Phenotypic Traits from Crop Images with Deep Learning

Shubhra Aich, Anique Josuttes, Ilya Ovsyannikov et al.

In this paper, we investigate estimating emergence and biomass traits from color images and elevation maps of wheat field plots. We employ a state-of-the-art deconvolutional network for segmentation and convolutional architectures, with residual and Inception-like layers, to estimate traits via high dimensional nonlinear regression. Evaluation was performed on two different species of wheat, grown in field plots for an experimental plant breeding study. Our framework achieves satisfactory performance with mean and standard deviation of absolute difference of 1.05 and 1.40 counts for emergence and 1.45 and 2.05 for biomass estimation. Our results for counting wheat plants from field images are better than the accuracy reported for the similar, but arguably less difficult, task of counting leaves from indoor images of rosette plants. Our results for biomass estimation, even with a very small dataset, improve upon all previously proposed approaches in the literature.

12.3CVAug 24, 2017Code

Leaf Counting with Deep Convolutional and Deconvolutional Networks

Shubhra Aich, Ian Stavness

In this paper, we investigate the problem of counting rosette leaves from an RGB image, an important task in plant phenotyping. We propose a data-driven approach for this task generalized over different plant species and imaging setups. To accomplish this task, we use state-of-the-art deep learning architectures: a deconvolutional network for initial segmentation and a convolutional network for leaf counting. Evaluation is performed on the leaf counting challenge dataset at CVPPP-2017. Despite the small number of training samples in this dataset, as compared to typical deep learning image sets, we obtain satisfactory performance on segmenting leaves from the background as a whole and counting the number of leaves using simple data augmentation strategies. Comparative analysis is provided against methods evaluated on the previous competition datasets. Our framework achieves mean and standard deviation of absolute count difference of 1.62 and 2.30 averaged over all five test datasets.

1.5NEJun 13, 2017

Prediction of Muscle Activations for Reaching Movements using Deep Neural Networks

Najeeb Khan, Ian Stavness

The motor control problem involves determining the time-varying muscle activation trajectories required to accomplish a given movement. Muscle redundancy makes motor control a challenging task: there are many possible activation trajectories that accomplish the same movement. Despite this redundancy, most movements are accomplished in highly stereotypical ways. For example, point-to-point reaching movements are almost universally performed with very similar smooth trajectories. Optimization methods are commonly used to predict muscle forces for measured movements. However, these approaches require computationally expensive simulations and are sensitive to the chosen optimality criteria and regularization. In this work, we investigate deep autoencoders for the prediction of muscle activation trajectories for point-to-point reaching movements. We evaluate our DNN predictions with simulated reaches and two methods to generate the muscle activations: inverse dynamics (ID) and optimal control (OC) criteria. We also investigate optimal network parameters and training criteria to improve the accuracy of the predictions.