Steven Waslander

h-index11

17papers

2,413citations

Novelty46%

AI Score53

Ranked #30,267 of 201,018 authors (top 15%)#12,322 in CV (top 21%)

17 Papers

ROJun 1

Fixed-Time Dynamic Landing of Quadrotors using Adaptive Unscented Kalman Filtering and Nonlinear Model Predictive Control

Mohammadreza Izadi, Zeinab Shayan, Steven Waslander et al.

This paper introduces an estimation and control framework for dynamic landing of multi-rotor uncrewed aerial vehicles on moving platforms. The proposed method integrates nonlinear model predictive control with a real-time minimum-jerk trajectory planner that enforces a prescribed touchdown time, enabling consistent timing during the terminal descent. To enhance robustness in the presence of time-varying sensing quality, we utilize an adaptive unscented kalman filter that updates the process and measurement noise statistics online. In addition, we provide a reference feasibility analysis showing that minimum-jerk references induce bounded thrust and torque commands under standard tracking hypotheses. The proposed framework is evaluated in simulation and hardware experiments, and it is shown to achieve repeatable landings and improved platform velocity prediction accuracy relative to EKF/UKF-based methods.

CVJun 1, 2022

LiDAR-MIMO: Efficient Uncertainty Estimation for LiDAR-based 3D Object Detection

Matthew Pitropov, Chengjie Huang, Vahdat Abdelzad et al. · utoronto

The estimation of uncertainty in robotic vision, such as 3D object detection, is an essential component in developing safe autonomous systems aware of their own performance. However, the deployment of current uncertainty estimation methods in 3D object detection remains challenging due to timing and computational constraints. To tackle this issue, we propose LiDAR-MIMO, an adaptation of the multi-input multi-output (MIMO) uncertainty estimation method to the LiDAR-based 3D object detection task. Our method modifies the original MIMO by performing multi-input at the feature level to ensure the detection, uncertainty estimation, and runtime performance benefits are retained despite the limited capacity of the underlying detector and the large computational costs of point cloud processing. We compare LiDAR-MIMO with MC dropout and ensembles as baselines and show comparable uncertainty estimation results with only a small number of output heads. Further, LiDAR-MIMO can be configured to be twice as fast as MC dropout and ensembles, while achieving higher mAP than MC dropout and approaching that of ensembles.

CVSep 1, 2024

Image-to-Lidar Relational Distillation for Autonomous Driving Data

Anas Mahmoud, Ali Harakeh, Steven Waslander · utoronto

Pre-trained on extensive and diverse multi-modal datasets, 2D foundation models excel at addressing 2D tasks with little or no downstream supervision, owing to their robust representations. The emergence of 2D-to-3D distillation frameworks has extended these capabilities to 3D models. However, distilling 3D representations for autonomous driving datasets presents challenges like self-similarity, class imbalance, and point cloud sparsity, hindering the effectiveness of contrastive distillation, especially in zero-shot learning contexts. Whereas other methodologies, such as similarity-based distillation, enhance zero-shot performance, they tend to yield less discriminative representations, diminishing few-shot performance. We investigate the gap in structure between the 2D and the 3D representations that result from state-of-the-art distillation frameworks and reveal a significant mismatch between the two. Additionally, we demonstrate that the observed structural gap is negatively correlated with the efficacy of the distilled representations on zero-shot and few-shot 3D semantic segmentation. To bridge this gap, we propose a relational distillation framework enforcing intra-modal and cross-modal constraints, resulting in distilled 3D representations that closely capture the structure of the 2D representation. This alignment significantly enhances 3D representation performance over those learned through contrastive distillation in zero-shot segmentation tasks. Furthermore, our relational loss consistently improves the quality of 3D representations in both in-distribution and out-of-distribution few-shot segmentation tasks, outperforming approaches that rely on the similarity loss.

CVApr 17, 2023

ProPanDL: A Modular Architecture for Uncertainty-Aware Panoptic Segmentation

Jacob Deery, Chang Won Lee, Steven Waslander · utoronto

We introduce ProPanDL, a family of networks capable of uncertainty-aware panoptic segmentation. Unlike existing segmentation methods, ProPanDL is capable of estimating full probability distributions for both the semantic and spatial aspects of panoptic segmentation. We implement and evaluate ProPanDL variants capable of estimating both parametric (Variance Network) and parameter-free (SampleNet) distributions quantifying pixel-wise spatial uncertainty. We couple these approaches with two methods (Temperature Scaling and Evidential Deep Learning) for semantic uncertainty estimation. To evaluate the uncertainty-aware panoptic segmentation task, we address limitations with existing approaches by proposing new metrics that enable separate evaluation of spatial and semantic uncertainty. We additionally propose the use of the energy score, a proper scoring rule, for more robust evaluation of spatial output distributions. Using these metrics, we conduct an extensive evaluation of ProPanDL variants. Our results demonstrate that ProPanDL is capable of estimating well-calibrated and meaningful output distributions while still retaining strong performance on the base panoptic segmentation task.

CVJul 11, 2023

Class Instance Balanced Learning for Long-Tailed Classification

Marc-Antoine Lavoie, Steven Waslander · utoronto

The long-tailed image classification task remains important in the development of deep neural networks as it explicitly deals with large imbalances in the class frequencies of the training data. While uncommon in engineered datasets, this imbalance is almost always present in real-world data. Previous approaches have shown that combining cross-entropy and contrastive learning can improve performance on the long-tailed task, but they do not explore the tradeoff between head and tail classes. We propose a novel class instance balanced loss (CIBL), which reweights the relative contributions of a cross-entropy and a contrastive loss as a function of the frequency of class instances in the training batch. This balancing favours the contrastive loss for more common classes, leading to a learned classifier with a more balanced performance across all class frequencies. Furthermore, increasing the relative weight on the contrastive head shifts performance from common (head) to rare (tail) classes, allowing the user to skew the performance towards these classes if desired. We also show that changing the linear classifier head with a cosine classifier yields a network that can be trained to similar performance in substantially fewer epochs. We obtain competitive results on both CIFAR-100-LT and ImageNet-LT.

CVJul 6, 2024Code

JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention

Brian Cheong, Jiachen Zhou, Steven Waslander

Tracking-by-detection (TBD) methods achieve state-of-the-art performance on 3D tracking benchmarks for autonomous driving. On the other hand, tracking-by-attention (TBA) methods have the potential to outperform TBD methods, particularly for long occlusions and challenging detection settings. This work investigates why TBA methods continue to lag in performance behind TBD methods using a LiDAR-based joint detector and tracker called JDT3D. Based on this analysis, we propose two generalizable methods to bridge the gap between TBD and TBA methods: track sampling augmentation and confidence-based query propagation. JDT3D is trained and evaluated on the nuScenes dataset, achieving 0.574 on the AMOTA metric on the nuScenes test set, outperforming all existing LiDAR-based TBA approaches by over 6%. Based on our results, we further discuss some potential challenges with the existing TBA model formulation to explain the continued gap in performance with TBD methods. The implementation of JDT3D can be found at the following link: https://github.com/TRAILab/JDT3D.

CVFeb 5

Contour Refinement using Discrete Diffusion in Low Data Regime

Fei Yu Guan, Ian Keefe, Sophie Wilkinson et al.

Boundary detection of irregular and translucent objects is an important problem with applications in medical imaging, environmental monitoring and manufacturing, where many of these applications are plagued with scarce labeled data and low in situ computational resources. While recent image segmentation studies focus on segmentation mask alignment with ground-truth, the task of boundary detection remains understudied, especially in the low data regime. In this work, we present a lightweight discrete diffusion contour refinement pipeline for robust boundary detection in the low data regime. We use a Convolutional Neural Network(CNN) architecture with self-attention layers as the core of our pipeline, and condition on a segmentation mask, iteratively denoising a sparse contour representation. We introduce multiple novel adaptations for improved low-data efficacy and inference efficiency, including using a simplified diffusion process, a customized model architecture, and minimal post processing to produce a dense, isolated contour given a dataset of size <500 training images. Our method outperforms several SOTA baselines on the medical imaging dataset KVASIR, is competitive on HAM10K and our custom wildfire dataset, Smoke, while improving inference framerate by 3.5X.

CVOct 11, 2021Code

UrbanNet: Leveraging Urban Maps for Long Range 3D Object Detection

Juan Carrillo, Steven Waslander

Relying on monocular image data for precise 3D object detection remains an open problem, whose solution has broad implications for cost-sensitive applications such as traffic monitoring. We present UrbanNet, a modular architecture for long range monocular 3D object detection with static cameras. Our proposed system combines commonly available urban maps along with a mature 2D object detector and an efficient 3D object descriptor to accomplish accurate detection at long range even when objects are rotated along any of their three axes. We evaluate UrbanNet on a novel challenging synthetic dataset and highlight the advantages of its design for traffic detection in roads with changing slope, where the flat ground approximation does not hold. Data and code are available at https://github.com/TRAILab/UrbanNet

CVNov 20, 2020Code

A Review and Comparative Study on Probabilistic Object Detection in Autonomous Driving

Di Feng, Ali Harakeh, Steven Waslander et al.

Capturing uncertainty in object detection is indispensable for safe autonomous driving. In recent years, deep learning has become the de-facto approach for object detection, and many probabilistic object detectors have been proposed. However, there is no summary on uncertainty estimation in deep object detection, and existing methods are not only built with different network architectures and uncertainty estimation methods, but also evaluated on different datasets with a wide range of evaluation metrics. As a result, a comparison among methods remains challenging, as does the selection of a model that best suits a particular application. This paper aims to alleviate this problem by providing a review and comparative study on existing probabilistic object detection methods for autonomous driving applications. First, we provide an overview of generic uncertainty estimation in deep learning, and then systematically survey existing methods and evaluation metrics for probabilistic object detection. Next, we present a strict comparative study for probabilistic object detection based on an image detector and three public autonomous driving datasets. Finally, we present a discussion of the remaining challenges and future works. Code has been made available at https://github.com/asharakeh/pod_compare.git

CVJul 16, 2018Code

Unlimited Road-scene Synthetic Annotation (URSA) Dataset

Matt Angus, Mohamed ElBalkini, Samin Khan et al.

In training deep neural networks for semantic segmentation, the main limiting factor is the low amount of ground truth annotation data that is available in currently existing datasets. The limited availability of such data is due to the time cost and human effort required to accurately and consistently label real images on a pixel level. Modern sandbox video game engines provide open world environments where traffic and pedestrians behave in a pseudo-realistic manner. This caters well to the collection of a believable road-scene dataset. Utilizing open-source tools and resources found in single-player modding communities, we provide a method for persistent, ground truth, asset annotation of a game world. By collecting a synthetic dataset containing upwards of $1,000,000$ images, we demonstrate real-time, on-demand, ground truth data annotation capability of our method. Supplementing this synthetic data to Cityscapes dataset, we show that our data generation method provides qualitative as well as quantitative improvements---for training networks---over previous methods that use video games as surrogate.

CVDec 6, 2017Code

Joint 3D Proposal Generation and Object Detection from View Aggregation

Jason Ku, Melissa Mozifian, Jungwook Lee et al.

We present AVOD, an Aggregate View Object Detection network for autonomous driving scenarios. The proposed neural network architecture uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network. The proposed RPN uses a novel architecture capable of performing multimodal feature fusion on high resolution feature maps to generate reliable 3D object proposals for multiple object classes in road scenes. Using these proposals, the second stage detection network performs accurate oriented 3D bounding box regression and category classification to predict the extents, orientation, and classification of objects in 3D space. Our proposed architecture is shown to produce state of the art results on the KITTI 3D object detection benchmark while running in real time with a low memory footprint, making it a suitable candidate for deployment on autonomous vehicles. Code is at: https://github.com/kujason/avod

ROMar 13

A Photorealistic Dataset and Vision-Based Algorithm for Anomaly Detection During Proximity Operations in Lunar Orbit

Selina Leveugle, Chang Won Lee, Svetlana Stolpner et al.

NASA's forthcoming Lunar Gateway space station, which will be uncrewed most of the time, will need to operate with an unprecedented level of autonomy. One key challenge is enabling the Canadarm3, the Gateway's external robotic system, to detect hazards in its environment using its onboard inspection cameras. This task is complicated by the extreme and variable lighting conditions in space. In this paper, we introduce the visual anomaly detection and localization task for the space domain and establish a benchmark based on a synthetic dataset called ALLO (Anomaly Localization in Lunar Orbit). We show that state-of-the-art visual anomaly detection methods often fail in the space domain, motivating the need for new approaches. To address this, we propose MRAD (Model Reference Anomaly Detection), a statistical algorithm that leverages the known pose of the Canadarm3 and a CAD model of the Gateway to generate reference images of the expected scene appearance. Anomalies are then identified as deviations from this model-generated reference. On the ALLO dataset, MRAD surpasses state-of-the-art anomaly detection algorithms, achieving an AP score of 62.9% at the pixel level and an AUROC score of 75.0% at the image level. Given the low tolerance for risk in space operations and the lack of domain-specific data, we emphasize the need for novel, robust, and accurate anomaly detection methods to handle the challenging visual conditions found in lunar orbit and beyond.

LGFeb 19, 2022

Accurate Prediction and Uncertainty Estimation using Decoupled Prediction Interval Networks

Kinjal Patel, Steven Waslander

We propose a network architecture capable of reliably estimating uncertainty of regression based predictions without sacrificing accuracy. The current state-of-the-art uncertainty algorithms either fall short of achieving prediction accuracy comparable to the mean square error optimization or underestimate the variance of network predictions. We propose a decoupled network architecture that is capable of accomplishing both at the same time. We achieve this by breaking down the learning of prediction and prediction interval (PI) estimations into a two-stage training process. We use a custom loss function for learning a PI range around optimized mean estimation with a desired coverage of a proportion of the target labels within the PI range. We compare the proposed method with current state-of-the-art uncertainty quantification algorithms on synthetic datasets and UCI benchmarks, reducing the error in the predictions by 23 to 34% while maintaining 95% Prediction Interval Coverage Probability (PICP) for 7 out of 9 UCI benchmark datasets. We also examine the quality of our predictive uncertainty by evaluating on Active Learning and demonstrating 17 to 36% error reduction on UCI benchmarks.

CVJul 29, 2021

Bayesian Embeddings for Few-Shot Open World Recognition

John Willes, James Harrison, Ali Harakeh et al.

As autonomous decision-making agents move from narrow operating environments to unstructured worlds, learning systems must move from a closed-world formulation to an open-world and few-shot setting in which agents continuously learn new classes from small amounts of information. This stands in stark contrast to modern machine learning systems that are typically designed with a known set of classes and a large number of examples for each class. In this work we extend embedding-based few-shot learning algorithms to the open-world recognition setting. We combine Bayesian non-parametric class priors with an embedding-based pre-training scheme to yield a highly flexible framework which we refer to as few-shot learning for open world recognition (FLOWR). We benchmark our framework on open-world extensions of the common MiniImageNet and TieredImageNet few-shot learning datasets. Our results show, compared to prior methods, strong classification accuracy performance and up to a 12% improvement in H-measure (a measure of novel class detection) from our non-parametric open-world few-shot learning scheme.

CVJan 27, 2020

Canadian Adverse Driving Conditions Dataset

Matthew Pitropov, Danson Garcia, Jason Rebello et al.

The Canadian Adverse Driving Conditions (CADC) dataset was collected with the Autonomoose autonomous vehicle platform, based on a modified Lincoln MKZ. The dataset, collected during winter within the Region of Waterloo, Canada, is the first autonomous vehicle dataset that focuses on adverse driving conditions specifically. It contains 7,000 frames collected through a variety of winter weather conditions of annotated data from 8 cameras (Ximea MQ013CG-E2), Lidar (VLP-32C) and a GNSS+INS system (Novatel OEM638). The sensors are time synchronized and calibrated with the intrinsic and extrinsic calibrations included in the dataset. Lidar frame annotations that represent ground truth for 3D object detection and tracking have been provided by Scale AI.

MASep 17, 2019

TruPercept: Trust Modelling for Autonomous Vehicle Cooperative Perception from Synthetic Data

Braden Hurl, Robin Cohen, Krzysztof Czarnecki et al.

Inter-vehicle communication for autonomous vehicles (AVs) stands to provide significant benefits in terms of perception robustness. We propose a novel approach for AVs to communicate perceptual observations, tempered by trust modelling of peers providing reports. Based on the accuracy of reported object detections as verified locally, communicated messages can be fused to augment perception performance beyond line of sight and at great distance from the ego vehicle. Also presented is a new synthetic dataset which can be used to test cooperative perception. The TruPercept dataset includes unreliable and malicious behaviour scenarios to experiment with some challenges cooperative perception introduces. The TruPercept runtime and evaluation framework allows modular component replacement to facilitate ablation studies as well as the creation of new trust scenarios we are able to show.

CVMay 1, 2019

Precise Synthetic Image and LiDAR (PreSIL) Dataset for Autonomous Vehicle Perception

Braden Hurl, Krzysztof Czarnecki, Steven Waslander

We introduce the Precise Synthetic Image and LiDAR (PreSIL) dataset for autonomous vehicle perception. Grand Theft Auto V (GTA V), a commercial video game, has a large detailed world with realistic graphics, which provides a diverse data collection environment. Existing works creating synthetic LiDAR data for autonomous driving with GTA V have not released their datasets, rely on an in-game raycasting function which represents people as cylinders, and can fail to capture vehicles past 30 metres. Our work creates a precise LiDAR simulator within GTA V which collides with detailed models for all entities no matter the type or position. The PreSIL dataset consists of over 50,000 frames and includes high-definition images with full resolution depth information, semantic segmentation (images), point-wise segmentation (point clouds), and detailed annotations for all vehicles and people. Collecting additional data with our framework is entirely automatic and requires no human annotation of any kind. We demonstrate the effectiveness of our dataset by showing an improvement of up to 5% average precision on the KITTI 3D Object Detection benchmark challenge when state-of-the-art 3D object detection networks are pre-trained with our data. The data and code are available at https://tinyurl.com/y3tb9sxy