CVAug 11, 2023Code
MS3D++: Ensemble of Experts for Multi-Source Unsupervised Domain Adaptation in 3D Object DetectionDarren Tsai, Julie Stephany Berrio, Mao Shan et al.
Deploying 3D detectors in unfamiliar domains has been demonstrated to result in a significant 70-90% drop in detection rate due to variations in lidar, geography, or weather from their training dataset. This domain gap leads to missing detections for densely observed objects, misaligned confidence scores, and increased high-confidence false positives, rendering the detector highly unreliable. To address this, we introduce MS3D++, a self-training framework for multi-source unsupervised domain adaptation in 3D object detection. MS3D++ generates high-quality pseudo-labels, allowing 3D detectors to achieve high performance on a range of lidar types, regardless of their density. Our approach effectively fuses predictions of an ensemble of multi-frame pre-trained detectors from different source domains to improve domain generalization. We subsequently refine predictions temporally to ensure temporal consistency in box localization and object classification. Furthermore, we present an in-depth study into the performance and idiosyncrasies of various 3D detector components in a cross-domain context, providing valuable insights for improved cross-domain detector ensembling. Experimental results on Waymo, nuScenes and Lyft demonstrate that detectors trained with MS3D++ pseudo-labels achieve state-of-the-art performance, comparable to training with human-annotated labels in Bird's Eye View (BEV) evaluation for both low and high density lidar. Code is available at https://github.com/darrenjkt/MS3D
CVSep 14, 2022Code
Viewer-Centred Surface Completion for Unsupervised Domain Adaptation in 3D Object DetectionDarren Tsai, Julie Stephany Berrio, Mao Shan et al.
Every autonomous driving dataset has a different configuration of sensors, originating from distinct geographic regions and covering various scenarios. As a result, 3D detectors tend to overfit the datasets they are trained on. This causes a drastic decrease in accuracy when the detectors are trained on one dataset and tested on another. We observe that lidar scan pattern differences form a large component of this reduction in performance. We address this in our approach, SEE-VCN, by designing a novel viewer-centred surface completion network (VCN) to complete the surfaces of objects of interest within an unsupervised domain adaptation framework, SEE. With SEE-VCN, we obtain a unified representation of objects across datasets, allowing the network to focus on learning geometry, rather than overfitting on scan patterns. By adopting a domain-invariant representation, SEE-VCN can be classed as a multi-target domain adaptation approach where no annotations or re-training is required to obtain 3D detections for new scan patterns. Through extensive experiments, we show that our approach outperforms previous domain adaptation methods in multiple domain adaptation settings. Our code and data are available at https://github.com/darrenjkt/SEE-VCN.
CVApr 5, 2023
MS3D: Leveraging Multiple Detectors for Unsupervised Domain Adaptation in 3D Object DetectionDarren Tsai, Julie Stephany Berrio, Mao Shan et al.
We introduce Multi-Source 3D (MS3D), a new self-training pipeline for unsupervised domain adaptation in 3D object detection. Despite the remarkable accuracy of 3D detectors, they often overfit to specific domain biases, leading to suboptimal performance in various sensor setups and environments. Existing methods typically focus on adapting a single detector to the target domain, overlooking the fact that different detectors possess distinct expertise on different unseen domains. MS3D leverages this by combining different pre-trained detectors from multiple source domains and incorporating temporal information to produce high-quality pseudo-labels for fine-tuning. Our proposed Kernel-Density Estimation (KDE) Box Fusion method fuses box proposals from multiple domains to obtain pseudo-labels that surpass the performance of the best source domain detectors. MS3D exhibits greater robustness to domain shift and produces accurate pseudo-labels over greater distances, making it well-suited for high-to-low beam domain adaptation and vice versa. Our method achieved state-of-the-art performance on all evaluated datasets, and we demonstrate that the pre-trained detector's source dataset has minimal impact on the fine-tuned result, making MS3D suitable for real-world applications.
51.5ROMar 17
The Era of End-to-End Autonomy: Transitioning from Rule-Based Driving to Large Driving ModelsEduardo Nebot, Julie Stephany Berrio Perez
Autonomous driving is undergoing a shift from modular rule based pipelines toward end to end (E2E) learning systems. This paper examines this transition by tracing the evolution from classical sense perceive plan control architectures to large driving models (LDMs) capable of mapping raw sensor input directly to driving actions. We analyze recent developments including Tesla's Full Self Driving (FSD) V12 V14, Rivian's Unified Intelligence platform, NVIDIA Cosmos, and emerging commercial robotaxi deployments, focusing on architectural design, deployment strategies, safety considerations and industry implications. A key emerging product category is supervised E2E driving, often referred to as FSD (Supervised) or L2 plus plus, which several manufacturers plan to deploy from 2026 onwards. These systems can perform most of the Dynamic Driving Task (DDT) in complex environments while requiring human supervision, shifting the driver's role to safety oversight. Early operational evidence suggests E2E learning handles the long tail distribution of real world driving scenarios and is becoming a dominant commercial strategy. We also discuss how similar architectural advances may extend beyond autonomous vehicles (AV) to other embodied AI systems, including humanoid robotics.
CVJul 14, 2023
LightFormer: An End-to-End Model for Intersection Right-of-Way Recognition Using Traffic Light Signals and an Attention MechanismZhenxing Ming, Julie Stephany Berrio, Mao Shan et al.
For smart vehicles driving through signalised intersections, it is crucial to determine whether the vehicle has right of way given the state of the traffic lights. To address this issue, camera based sensors can be used to determine whether the vehicle has permission to proceed straight, turn left or turn right. This paper proposes a novel end to end intersection right of way recognition model called LightFormer to generate right of way status for available driving directions in complex urban intersections. The model includes a spatial temporal inner structure with an attention mechanism, which incorporates features from past image to contribute to the classification of the current frame right of way status. In addition, a modified, multi weight arcface loss is introduced to enhance the model classification performance. Finally, the proposed LightFormer is trained and tested on two public traffic light datasets with manually augmented labels to demonstrate its effectiveness.
CVMar 23, 2021Code
Optimising the selection of samples for robust lidar camera calibrationDarren Tsai, Stewart Worrall, Mao Shan et al.
We propose a robust calibration pipeline that optimises the selection of calibration samples for the estimation of calibration parameters that fit the entire scene. We minimise user error by automating the data selection process according to a metric, called Variability of Quality (VOQ) that gives a score to each calibration set of samples. We show that this VOQ score is correlated with the estimated calibration parameter's ability to generalise well to the entire scene, thereby overcoming the overfitting problems of existing calibration algorithms. Our approach has the benefits of simplifying the calibration process for practitioners of any calibration expertise level and providing an objective measure of the quality for our calibration pipeline's input and output data. We additionally use a novel method of assessing the accuracy of the calibration parameters. It involves computing reprojection errors for the entire scene to ensure that the parameters are well fitted to all features in the scene. Our proposed calibration pipeline takes 90s, and obtains an average reprojection error of 1-1.2cm, with standard deviation of 0.4-0.5cm over 46 poses evenly distributed in a scene. This process has been validated by experimentation on a high resolution, software definable lidar, Baraja Spectrum-Scan; and a low, fixed resolution lidar, Velodyne VLP-16. We have shown that despite the vast differences in lidar technologies, our proposed approach manages to estimate robust calibration parameters for both. Our code and data set used for this paper are made available as open-source.
CVNov 17, 2021
See Eye to Eye: A Lidar-Agnostic 3D Detection Framework for Unsupervised Multi-Target Domain AdaptationDarren Tsai, Julie Stephany Berrio, Mao Shan et al.
Sampling discrepancies between different manufacturers and models of lidar sensors result in inconsistent representations of objects. This leads to performance degradation when 3D detectors trained for one lidar are tested on other types of lidars. Remarkable progress in lidar manufacturing has brought about advances in mechanical, solid-state, and recently, adjustable scan pattern lidars. For the latter, existing works often require fine-tuning the model each time scan patterns are adjusted, which is infeasible. We explicitly deal with the sampling discrepancy by proposing a novel unsupervised multi-target domain adaptation framework, SEE, for transferring the performance of state-of-the-art 3D detectors across both fixed and flexible scan pattern lidars without requiring fine-tuning of models by end-users. Our approach interpolates the underlying geometry and normalizes the scan pattern of objects from different lidars before passing them to the detection network. We demonstrate the effectiveness of SEE on public datasets, achieving state-of-the-art results, and additionally provide quantitative results on a novel high-resolution lidar to prove the industry applications of our framework.
ROMar 26, 2021
A Persistent and Context-aware Behavior Tree Framework for Multi Sensor Localization in Autonomous DrivingSiqi Yi, Stewart Worrall, Eduardo Nebot
Robust and persistent localisation is essential for ensuring the safe operation of autonomous vehicles. When operating in large and diverse urban driving environments, autonomous vehicles are frequently exposed to situations that violate the assumptions of algorithms, suffer from the failure of one or more sensors, or other events that lead to a loss of localisation. This paper proposes the use of a behavior tree framework that can monitor the performance of localisation health metrics and triggers intelligent responses such as sensor switching and loss recovery. The algorithm presented selects the best available sensor data at given time and location, and can perform a series of actions to react to adverse situations. The behavior tree encapsulates the system-level logic to give commands that make up the intelligent behaviors, so that the localisation "actuators" (data association, optimisation, filters, etc) can perform decoupled actions without needing context. Experimental results to validate the algorithms are presented using the University of Sydney Campus dataset which was taken weekly over an 18 month period. A video showing the online localisation process can be found here: https://youtu.be/353uKqXLV5g
RONov 23, 2020
Socially Aware Crowd Navigation with Multimodal Pedestrian Trajectory Prediction for Autonomous VehiclesKunming Li, Mao Shan, Karan Narula et al.
Seamlessly operating an autonomous vehicle in a crowded pedestrian environment is a very challenging task. This is because human movement and interactions are very hard to predict in such environments. Recent work has demonstrated that reinforcement learning-based methods have the ability to learn to drive in crowds. However, these methods can have very poor performance due to inaccurate predictions of the pedestrians' future state as human motion prediction has a large variance. To overcome this problem, we propose a new method, SARL-SGAN-KCE, that combines a deep socially aware attentive value network with a human multimodal trajectory prediction model to help identify the optimal driving policy. We also introduce a novel technique to extend the discrete action space with minimal additional computational requirements. The kinematic constraints of the vehicle are also considered to ensure smooth and safe trajectories. We evaluate our method against the state of art methods for crowd navigation and provide an ablation study to show that our method is safer and closer to human behaviour.
CVNov 23, 2020
Attentional-GCNN: Adaptive Pedestrian Trajectory Prediction towards Generic Autonomous Vehicle Use CasesKunming Li, Stuart Eiffert, Mao Shan et al.
Autonomous vehicle navigation in shared pedestrian environments requires the ability to predict future crowd motion both accurately and with minimal delay. Understanding the uncertainty of the prediction is also crucial. Most existing approaches however can only estimate uncertainty through repeated sampling of generative models. Additionally, most current predictive models are trained on datasets that assume complete observability of the crowd using an aerial view. These are generally not representative of real-world usage from a vehicle perspective, and can lead to the underestimation of uncertainty bounds when the on-board sensors are occluded. Inspired by prior work in motion prediction using spatio-temporal graphs, we propose a novel Graph Convolutional Neural Network (GCNN)-based approach, Attentional-GCNN, which aggregates information of implicit interaction between pedestrians in a crowd by assigning attention weight in edges of the graph. Our model can be trained to either output a probabilistic distribution or faster deterministic prediction, demonstrating applicability to autonomous vehicle use cases where either speed or accuracy with uncertainty bounds are required. To further improve the training of predictive models, we propose an automatically labelled pedestrian dataset collected from an intelligent vehicle platform representative of real-world use. Through experiments on a number of datasets, we show our proposed method achieves an improvement over the state of art by 10% Average Displacement Error (ADE) and 12% Final Displacement Error (FDE) with fast inference speeds.
RONov 17, 2020
Demonstrations of Cooperative Perception: Safety and Robustness in Connected and Automated Vehicle OperationsMao Shan, Karan Narula, Yung Fei Wong et al.
Cooperative perception, or collective perception (CP) is an emerging and promising technology for intelligent transportation systems (ITS). It enables an ITS station (ITS-S) to share its local perception information with others by means of vehicle-to-X (V2X) communication, thereby achieving improved efficiency and safety in road transportation. In this paper, we present our recent progress on the development of a connected and automated vehicle (CAV) and intelligent roadside unit (IRSU). We present three different experiments to demonstrate the use of CP service within intelligent infrastructure to improve awareness of vulnerable road users (VRU) and thus safety for CAVs in various traffic scenarios. We demonstrate in the experiments that a connected vehicle (CV) can "see" a pedestrian around the corners. More importantly, we demonstrate how CAVs can autonomously and safely interact with walking and running pedestrians, relying only on the CP information from the IRSU through vehicle-to-infrastructure (V2I) communication. This is one of the first demonstrations of urban vehicle automation using only CP information. We also address in the paper the handling of collective perception messages (CPMs) received from the IRSU, and passing them through a pipeline of CP information coordinate transformation with uncertainty, multiple road user tracking, and eventually path planning/decision making within the CAV. The experimental results were obtained with manually driven CV, fully autonomous CAV, and an IRSU retrofitted with vision and laser sensors and a road user tracking system.
RONov 16, 2020
Efficient falsification approach for autonomous vehicle validation using a parameter optimisation technique based on reinforcement learningDhanoop Karunakaran, Stewart Worrall, Eduardo Nebot
The widescale deployment of Autonomous Vehicles (AV) appears to be imminent despite many safety challenges that are yet to be resolved. It is well-known that there are no universally agreed Verification and Validation (VV) methodologies guarantee absolute safety, which is crucial for the acceptance of this technology. The uncertainties in the behaviour of the traffic participants and the dynamic world cause stochastic reactions in advanced autonomous systems. The addition of ML algorithms and probabilistic techniques adds significant complexity to the process for real-world testing when compared to traditional methods. Most research in this area focuses on generating challenging concrete scenarios or test cases to evaluate the system performance by looking at the frequency distribution of extracted parameters as collected from the real-world data. These approaches generally employ Monte-Carlo simulation and importance sampling to generate critical cases. This paper presents an efficient falsification method to evaluate the System Under Test. The approach is based on a parameter optimisation problem to search for challenging scenarios. The optimisation process aims at finding the challenging case that has maximum return. The method applies policy-gradient reinforcement learning algorithm to enable the learning. The riskiness of the scenario is measured by the well established RSS safety metric, euclidean distance, and instance of a collision. We demonstrate that by using the proposed method, we can more efficiently search for challenging scenarios which could cause the system to fail in order to satisfy the safety requirements.
ROAug 28, 2020
Long-term map maintenance pipeline for autonomous vehiclesJulie Stephany Berrio, Stewart Worrall, Mao Shan et al.
For autonomous vehicles to operate persistently in a typical urban environment, it is essential to have high accuracy position information. This requires a mapping and localisation system that can adapt to changes over time. A localisation approach based on a single-survey map will not be suitable for long-term operation as it does not incorporate variations in the environment. In this paper, we present new algorithms to maintain a featured-based map. A map maintenance pipeline is proposed that can continuously update a map with the most relevant features taking advantage of the changes in the surroundings. Our pipeline detects and removes transient features based on their geometrical relationships with the vehicle's pose. Newly identified features became part of a new feature map and are assessed by the pipeline as candidates for the localisation map. By purging out-of-date features and adding newly detected features, we continually update the prior map to more accurately represent the most recent environment. We have validated our approach using the USyd Campus Dataset, which includes more than 18 months of data. The results presented demonstrate that our maintenance pipeline produces a resilient map which can provide sustained localisation performance over time.
CVJul 9, 2020
Camera-Lidar Integration: Probabilistic sensor fusion for semantic mappingJulie Stephany Berrio, Mao Shan, Stewart Worrall et al.
An automated vehicle operating in an urban environment must be able to perceive and recognise object/obstacles in a three-dimensional world while navigating in a constantly changing environment. In order to plan and execute accurate sophisticated driving maneuvers, a high-level contextual understanding of the surroundings is essential. Due to the recent progress in image processing, it is now possible to obtain high definition semantic information in 2D from monocular cameras, though cameras cannot reliably provide the highly accurate 3D information provided by lasers. The fusion of these two sensor modalities can overcome the shortcomings of each individual sensor, though there are a number of important challenges that need to be addressed in a probabilistic manner. In this paper, we address the common, yet challenging, lidar/camera/semantic fusion problems which are seldom approached in a wholly probabilistic manner. Our approach is capable of using a multi-sensor platform to build a three-dimensional semantic voxelized map that considers the uncertainty of all of the processes involved. We present a probabilistic pipeline that incorporates uncertainties from the sensor readings (cameras, lidar, IMU and wheel encoders), compensation for the motion of the vehicle, and heuristic label probabilities for the semantic images. We also present a novel and efficient viewpoint validation algorithm to check for occlusions from the camera frames. A probabilistic projection is performed from the camera images to the lidar point cloud. Each labelled lidar scan then feeds into an octree map building algorithm that updates the class probabilities of the map voxels every time a new observation is available. We validate our approach using a set of qualitative and quantitative experimental tests on the USyd Dataset.
CVJun 23, 2020
Probabilistic Crowd GAN: Multimodal Pedestrian Trajectory Prediction using a Graph Vehicle-Pedestrian Attention NetworkStuart Eiffert, Kunming Li, Mao Shan et al.
Understanding and predicting the intention of pedestrians is essential to enable autonomous vehicles and mobile robots to navigate crowds. This problem becomes increasingly complex when we consider the uncertainty and multimodality of pedestrian motion, as well as the implicit interactions between members of a crowd, including any response to a vehicle. Our approach, Probabilistic Crowd GAN, extends recent work in trajectory prediction, combining Recurrent Neural Networks (RNNs) with Mixture Density Networks (MDNs) to output probabilistic multimodal predictions, from which likely modal paths are found and used for adversarial training. We also propose the use of Graph Vehicle-Pedestrian Attention Network (GVAT), which models social interactions and allows input of a shared vehicle feature, showing that inclusion of this module leads to improved trajectory prediction both with and without the presence of a vehicle. Through evaluation on various datasets, we demonstrate improvements on the existing state of the art methods for trajectory prediction and illustrate how the true multimodal and uncertain nature of crowd interactions can be directly modelled.
ROMar 9, 2020
Probabilistic Egocentric Motion Correction of Lidar Point Cloud and Projection to Camera Images for Moving PlatformsMao Shan, Julie Stephany Berrio, Stewart Worrall et al.
The fusion of sensor data from heterogeneous sensors is crucial for robust perception in various robotics applications that involve moving platforms, for instance, autonomous vehicle navigation. In particular, combining camera and lidar sensors enables the projection of precise range information of the surrounding environment onto visual images. It also makes it possible to label each lidar point with visual segmentation/classification results for 3D mapping, which facilitates a higher level understanding of the scene. The task is however considered non-trivial due to intrinsic and extrinsic sensor calibration, and the distortion of lidar points resulting from the ego-motion of the platform. Despite the existence of many lidar ego-motion correction methods, the errors in the correction process due to uncertainty in ego-motion estimation are not possible to remove completely. It is thus essential to consider the problem a probabilistic process where the ego-motion estimation uncertainty is modelled and considered consistently. The paper investigates the probabilistic lidar ego-motion correction and lidar-to-camera projection, where both the uncertainty in the ego-motion estimation and time jitter in sensory measurements are incorporated. The proposed approach is validated both in simulation and using real-world data collected from an electric vehicle retrofitted with wide-angle cameras and a 16-beam scanning lidar.
ROMar 4, 2020
Efficient statistical validation with edge cases to evaluate Highly Automated VehiclesDhanoop Karunakaran, Stewart Worrall, Eduardo Nebot
The widescale deployment of Autonomous Vehicles (AV) seems to be imminent despite many safety challenges that are yet to be resolved. It is well known that there are no universally agreed Verification and Validation (VV) methodologies to guarantee absolute safety, which is crucial for the acceptance of this technology. Existing standards focus on deterministic processes where the validation requires only a set of test cases that cover the requirements. Modern autonomous vehicles will undoubtedly include machine learning and probabilistic techniques that require a much more comprehensive testing regime due to the non-deterministic nature of the operating design domain. A rigourous statistical validation process is an essential component required to address this challenge. Most research in this area focuses on evaluating system performance in large scale real-world data gathering exercises (number of miles travelled), or randomised test scenarios in simulation. This paper presents a new approach to compute the statistical characteristics of a system's behaviour by biasing automatically generated test cases towards the worst case scenarios, identifying potential unsafe edge cases.We use reinforcement learning (RL) to learn the behaviours of simulated actors that cause unsafe behaviour measured by the well established RSS safety metric. We demonstrate that by using the method we can more efficiently validate a system using a smaller number of test cases by focusing the simulation towards the worst case scenario, generating edge cases that correspond to unsafe situations.
ROMar 4, 2020
Semantic sensor fusion: from camera to sparse lidar informationJulie Stephany Berrio, Mao Shan, Stewart Worrall et al.
To navigate through urban roads, an automated vehicle must be able to perceive and recognize objects in a three-dimensional environment. A high-level contextual understanding of the surroundings is necessary to plan and execute accurate driving maneuvers. This paper presents an approach to fuse different sensory information, Light Detection and Ranging (lidar) scans and camera images. The output of a convolutional neural network (CNN) is used as classifier to obtain the labels of the environment. The transference of semantic information between the labelled image and the lidar point cloud is performed in four steps: initially, we use heuristic methods to associate probabilities to all the semantic classes contained in the labelled images. Then, the lidar points are corrected to compensate for the vehicle's motion given the difference between the timestamps of each lidar scan and camera image. In a third step, we calculate the pixel coordinate for the corresponding camera image. In the last step we perform the transfer of semantic information from the heuristic probability images to the lidar frame, while removing the lidar information that is not visible to the camera. We tested our approach in the Usyd Dataset \cite{usyd_dataset}, obtaining qualitative and quantitative results that demonstrate the validity of our probabilistic sensory fusion approach.
CVApr 29, 2019
Automatic extrinsic calibration between a camera and a 3D Lidar using 3D point and plane correspondencesSurabhi Verma, Julie Stephany Berrio, Stewart Worrall et al.
This paper proposes an automated method to obtain the extrinsic calibration parameters between a camera and a 3D lidar with as low as 16 beams. We use a checkerboard as a reference to obtain features of interest in both sensor frames. The calibration board centre point and normal vector are automatically extracted from the lidar point cloud by exploiting the geometry of the board. The corresponding features in the camera image are obtained from the camera's extrinsic matrix. We explain the reasons behind selecting these features, and why they are more robust compared to other possibilities. To obtain the optimal extrinsic parameters, we choose a genetic algorithm to address the highly non-linear state space. The process is automated after defining the bounds of the 3D experimental region relative to the lidar, and the true board dimensions. In addition, the camera is assumed to be intrinsically calibrated. Our method requires a minimum of 3 checkerboard poses, and the calibration accuracy is demonstrated by evaluating our algorithm using real world and simulated features.
ROApr 19, 2019
Geographical Map Registration and Fusion of Lidar-Aerial Orthoimagery in GISSiqi Yi, Stewart Worrall, Eduardo Nebot
Centimeter level globally accurate and consistent maps for autonomous vehicles navigation has long been achieved by on board real-time kinematic(RTK)-GPS in open areas. However when dealing with urban environments, GPS will experience multipath and blockage in urban canyon, under bridges, inside tunnels and in underground environments. In this paper we present strategies to efficiently register local maps in geographical coordinate systems through the tactical integration of GPS and information extracted from precisely geo-referenced high resolution aerial orthogonal imagery. Dense lidar point clouds obtained from moving vehicle are projected down to horizontal plane, accurately registered and overlaid on aerial orthoimagery. Sparse, robust and long-term pole-like landmarks are used as anchor points to link lidar and aerial image sensing, and constrain the spatial uncertainties of remaining lidar points that cannot be directly measured and identified. We achieved 15-75cm absolute average global accuracy using precisely geo-referenced aerial imagery as ground truth. This is valuable in enabling the fusion of ground vehicle on-board sensor features with features extracted from aerial images such as traffic and lane markings. It is also useful for cooperative sensing to have an unbiased and accurate global reference. Experimental results are presented demonstrating the accuracy and consistency of the maps when operating in large areas.
ROApr 18, 2019
Metrics for the Evaluation of localisation RobustnessSiqi Yi, Stewart Worrall, Eduardo Nebot
Robustness and safety are crucial properties for the real-world application of autonomous vehicles. One of the most critical components of any autonomous system is localisation. During the last 20 years there has been significant progress in this area with the introduction of very efficient algorithms for mapping, localisation and SLAM. Many of these algorithms present impressive demonstrations for a particular domain, but fail to operate reliably with changes to the operating environment. The aspect of robustness has not received enough attention and localisation systems for self-driving vehicle applications are seldom evaluated for their robustness. In this paper we propose novel metrics to effectively quantify localisation robustness with or without an accurate ground truth. The experimental results present a comprehensive analysis of the application of these metrics against a number of well known localisation strategies.
CVOct 24, 2018
Automated Evaluation of Semantic Segmentation Robustness for Autonomous DrivingWei Zhou, Julie Stephany Berrio, Stewart Worrall et al.
One of the fundamental challenges in the design of perception systems for autonomous vehicles is validating the performance of each algorithm under a comprehensive variety of operating conditions. In the case of vision-based semantic segmentation, there are known issues when encountering new scenarios that are sufficiently different to the training data. In addition, even small variations in environmental conditions such as illumination and precipitation can affect the classification performance of the segmentation model. Given the reliance on visual information, these effects often translate into poor semantic pixel classification which can potentially lead to catastrophic consequences when driving autonomously. This paper presents a novel method for analysing the robustness of semantic segmentation models and provides a number of metrics to evaluate the classification performance over a variety of environmental conditions. The process incorporates an additional sensor (lidar) to automate the process, eliminating the need for labour-intensive hand labelling of validation data. The system integrity can be monitored as the performance of the vision sensors are validated against a different sensor modality. This is necessary for detecting failures that are inherent to vision technology. Experimental results are presented based on multiple datasets collected at different times of the year with different environmental conditions. These results show that the semantic segmentation performance varies depending on the weather, camera parameters, existence of shadows, etc.. The results also demonstrate how the metrics can be used to compare and validate the performance after making improvements to a model, and compare the performance of different networks.
ROSep 26, 2018
Identifying robust landmarks in feature-based mapsJulie Stephany Berrio, James Ward, Stewart Worrall et al.
To operate in an urban environment, an automated vehicle must be capable of accurately estimating its position within a global map reference frame. This is necessary for optimal path planning and safe navigation. To accomplish this over an extended period of time, the global map requires long-term maintenance. This includes the addition of newly observable features and the removal of transient features belonging to dynamic objects. The latter is especially important for the long-term use of the map as matching against a map with features that no longer exist can result in incorrect data associations, and consequently erroneous localisation. This paper addresses the problem of removing features from the map that correspond to objects that are no longer observable/present in the environment. This is achieved by assigning a single score which depends on the geometric distribution and characteristics when the features are re-detected (or not) on different occasions. Our approach not only eliminates ephemeral features, but also can be used as a reduction algorithm for highly dense maps. We tested our approach using half a year of weekly drives over the same 500-metre section of road in an urban environment. The results presented demonstrate the validity of the long-term approach to map maintenance.
CVSep 13, 2018
Adapting Semantic Segmentation Models for Changes in Illumination and Camera PerspectiveWei Zhou, Alex Zyner, Stewart Worrall et al.
Semantic segmentation using deep neural networks has been widely explored to generate high-level contextual information for autonomous vehicles. To acquire a complete $180^\circ$ semantic understanding of the forward surroundings, we propose to stitch semantic images from multiple cameras with varying orientations. However, previously trained semantic segmentation models showed unacceptable performance after significant changes to the camera orientations and the lighting conditions. To avoid time-consuming hand labeling, we explore and evaluate the use of data augmentation techniques, specifically skew and gamma correction, from a practical real-world standpoint to extend the existing model and provide more robust performance. The presented experimental results have shown significant improvements with varying illumination and camera perspective changes.
CVJul 26, 2018
Naturalistic Driver Intention and Path Prediction using Recurrent Neural NetworksAlex Zyner, Stewart Worrall, Eduardo Nebot
Understanding the intentions of drivers at intersections is a critical component for autonomous vehicles. Urban intersections that do not have traffic signals are a common epicentre of highly variable vehicle movement and interactions. We present a method for predicting driver intent at urban intersections through multi-modal trajectory prediction with uncertainty. Our method is based on recurrent neural networks combined with a mixture density network output layer. To consolidate the multi-modal nature of the output probability distribution, we introduce a clustering algorithm that extracts the set of possible paths that exist in the prediction output, and ranks them according to likelihood. To verify the method's performance and generalizability, we present a real-world dataset that consists of over 23,000 vehicles traversing five different intersections, collected using a vehicle mounted Lidar based tracking system. An array of metrics is used to demonstrate the performance of the model against several baselines.