Paul Newman

CV
h-index17
49papers
2,048citations
Novelty49%
AI Score43

49 Papers

CVJun 30, 2022
BoxGraph: Semantic Place Recognition and Pose Estimation from 3D LiDAR

Georgi Pramatarov, Daniele De Martini, Matthew Gadd et al. · oxford

This paper is about extremely robust and lightweight localisation using LiDAR point clouds based on instance segmentation and graph matching. We model 3D point clouds as fully-connected graphs of semantically identified components where each vertex corresponds to an object instance and encodes its shape. Optimal vertex association across graphs allows for full 6-Degree-of-Freedom (DoF) pose estimation and place recognition by measuring similarity. This representation is very concise, condensing the size of maps by a factor of 25 against the state-of-the-art, requiring only 3kB to represent a 1.4MB laser scan. We verify the efficacy of our system on the SemanticKITTI dataset, where we achieve a new state-of-the-art in place recognition, with an average of 88.4% recall at 100% precision where the next closest competitor follows with 64.9%. We also show accurate metric pose estimation performance - estimating 6-DoF pose with median errors of 10 cm and 0.33 deg.

CVApr 20, 2023
Visual DNA: Representing and Comparing Images using Distributions of Neuron Activations

Benjamin Ramtoula, Matthew Gadd, Paul Newman et al.

Selecting appropriate datasets is critical in modern computer vision. However, no general-purpose tools exist to evaluate the extent to which two datasets differ. For this, we propose representing images - and by extension datasets - using Distributions of Neuron Activations (DNAs). DNAs fit distributions, such as histograms or Gaussians, to activations of neurons in a pre-trained feature extractor through which we pass the image(s) to represent. This extractor is frozen for all datasets, and we rely on its generally expressive power in feature space. By comparing two DNAs, we can evaluate the extent to which two datasets differ with granular control over the comparison attributes of interest, providing the ability to customise the way distances are measured to suit the requirements of the task at hand. Furthermore, DNAs are compact, representing datasets of any size with less than 15 megabytes. We demonstrate the value of DNAs by evaluating their applicability on several tasks, including conditional dataset comparison, synthetic image evaluation, and transfer learning, and across diverse datasets, ranging from synthetic cat images to celebrity faces and urban driving scenes.

CVMar 7, 2022
Depth-SIMS: Semi-Parametric Image and Depth Synthesis

Valentina Musat, Daniele De Martini, Matthew Gadd et al.

In this paper we present a compositing image synthesis method that generates RGB canvases with well aligned segmentation maps and sparse depth maps, coupled with an in-painting network that transforms the RGB canvases into high quality RGB images and the sparse depth maps into pixel-wise dense depth maps. We benchmark our method in terms of structural alignment and image quality, showing an increase in mIoU over SOTA by 3.7 percentage points and a highly competitive FID. Furthermore, we analyse the quality of the generated data as training data for semantic segmentation and depth completion, and show that our approach is more suited for this purpose than other methods.

CVOct 20, 2023
What you see is what you get: Experience ranking with deep neural dataset-to-dataset similarity for topological localisation

Matthew Gadd, Benjamin Ramtoula, Daniele De Martini et al.

Recalling the most relevant visual memories for localisation or understanding a priori the likely outcome of localisation effort against a particular visual memory is useful for efficient and robust visual navigation. Solutions to this problem should be divorced from performance appraisal against ground truth - as this is not available at run-time - and should ideally be based on generalisable environmental observations. For this, we propose applying the recently developed Visual DNA as a highly scalable tool for comparing datasets of images - in this work, sequences of map and live experiences. In the case of localisation, important dataset differences impacting performance are modes of appearance change, including weather, lighting, and season. Specifically, for any deep architecture which is used for place recognition by matching feature volumes at a particular layer, we use distribution measures to compare neuron-wise activation statistics between live images and multiple previously recorded past experiences, with a potentially large seasonal (winter/summer) or time of day (day/night) shift. We find that differences in these statistics correlate to performance when localising using a past experience with the same appearance gap. We validate our approach over the Nordland cross-season dataset as well as data from Oxford's University Parks with lighting and mild seasonal change, showing excellent ability of our system to rank actual localisation performance across candidate experiences.

ROJan 27, 2024Code
Open-RadVLAD: Fast and Robust Radar Place Recognition

Matthew Gadd, Paul Newman

Radar place recognition often involves encoding a live scan as a vector and matching this vector to a database in order to recognise that the vehicle is in a location that it has visited before. Radar is inherently robust to lighting or weather conditions, but place recognition with this sensor is still affected by: (1) viewpoint variation, i.e. translation and rotation, (2) sensor artefacts or "noises". For 360-degree scanning radar, rotation is readily dealt with by in some way aggregating across azimuths. Also, we argue in this work that it is more critical to deal with the richness of representation and sensor noises than it is to deal with translational invariance - particularly in urban driving where vehicles predominantly follow the same lane when repeating a route. In our method, for computational efficiency, we use only the polar representation. For partial translation invariance and robustness to signal noise, we use only a one-dimensional Fourier Transform along radial returns. We also achieve rotational invariance and a very discriminative descriptor space by building a vector of locally aggregated descriptors. Our method is more comprehensively tested than all prior radar place recognition work - over an exhaustive combination of all 870 pairs of trajectories from 30 Oxford Radar RobotCar Dataset sequences (each approximately 10 km). Code and detailed results are provided at github.com/mttgdd/open-radvlad, as an open implementation and benchmark for future work in this area. We achieve a median of 91.52% in Recall@1, outstripping the 69.55% for the only other open implementation, RaPlace, and at a fraction of its computational cost (relying on fewer integral transforms e.g. Radon, Fourier, and inverse Fourier).

LGDec 1, 2025
Fantastic Features and Where to Find Them: A Probing Method to combine Features from Multiple Foundation Models

Benjamin Ramtoula, Pierre-Yves Lajoie, Paul Newman et al.

Foundation models (FMs) trained with different objectives and data learn diverse representations, making some more effective than others for specific downstream tasks. Existing adaptation strategies, such as parameter-efficient fine-tuning, focus on individual models and do not exploit the complementary strengths across models. Probing methods offer a promising alternative by extracting information from frozen models, but current techniques do not scale well with large feature sets and often rely on dataset-specific hyperparameter tuning. We propose Combined backBones (ComBo), a simple and scalable probing-based adapter that effectively integrates features from multiple models and layers. ComBo compresses activations from layers of one or more FMs into compact token-wise representations and processes them with a lightweight transformer for task-specific prediction. Crucially, ComBo does not require dataset-specific tuning or backpropagation through the backbone models. However, not all models are equally relevant for all tasks. To address this, we introduce a mechanism that leverages ComBo's joint multi-backbone probing to efficiently evaluate each backbone's task-relevance, enabling both practical model comparison and improved performance through selective adaptation. On the 19 tasks of the VTAB-1k benchmark, ComBo outperforms previous probing methods, matches or surpasses more expensive alternatives, such as distillation-based model merging, and enables efficient probing of tuned models. Our results demonstrate that ComBo offers a practical and general-purpose framework for combining diverse representations from multiple FMs.

ROFeb 16, 2024
RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model

Jianhao Yuan, Shuyang Sun, Daniel Omeiza et al.

We need to trust robots that use often opaque AI methods. They need to explain themselves to us, and we need to trust their explanation. In this regard, explainability plays a critical role in trustworthy autonomous decision-making to foster transparency and acceptance among end users, especially in complex autonomous driving. Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent by producing control predictions along with natural language explanations. However, severe data scarcity due to expensive annotation costs and significant domain gaps between different datasets makes the development of a robust and generalisable system an extremely challenging task. Moreover, the prohibitively expensive training requirements of MLLM and the unsolved problem of catastrophic forgetting further limit their generalisability post-deployment. To address these challenges, we present RAG-Driver, a novel retrieval-augmented multi-modal large language model that leverages in-context learning for high-performance, explainable, and generalisable autonomous driving. By grounding in retrieved expert demonstration, we empirically validate that RAG-Driver achieves state-of-the-art performance in producing driving action explanations, justifications, and control signal prediction. More importantly, it exhibits exceptional zero-shot generalisation capabilities to unseen environments without further training endeavours.

CVFeb 27, 2024
Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data

David S. W. Williams, Daniele De Martini, Matthew Gadd et al.

Knowing when a trained segmentation model is encountering data that is different to its training data is important. Understanding and mitigating the effects of this play an important part in their application from a performance and assurance perspective - this being a safety concern in applications such as autonomous vehicles (AVs). This work presents a segmentation network that can detect errors caused by challenging test domains without any additional annotation in a single forward pass. As annotation costs limit the diversity of labelled datasets, we use easy-to-obtain, uncurated and unlabelled data to learn to perform uncertainty estimation by selectively enforcing consistency over data augmentation. To this end, a novel segmentation benchmark based on the SAX Dataset is used, which includes labelled test data spanning three autonomous-driving domains, ranging in appearance from dense urban to off-road. The proposed method, named Gamma-SSL, consistently outperforms uncertainty estimation and Out-of-Distribution (OoD) techniques on this difficult benchmark - by up to 10.7% in area under the receiver operating characteristic (ROC) curve and 19.2% in area under the precision-recall (PR) curve in the most challenging of the three scenarios.

CVMar 7, 2024
That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation

Georgi Pramatarov, Matthew Gadd, Paul Newman et al.

This paper is about 3D pose estimation on LiDAR scans with extremely minimal storage requirements to enable scalable mapping and localisation. We achieve this by clustering all points of segmented scans into semantic objects and representing them only with their respective centroid and semantic class. In this way, each LiDAR scan is reduced to a compact collection of four-number vectors. This abstracts away important structural information from the scenes, which is crucial for traditional registration approaches. To mitigate this, we introduce an object-matching network based on self- and cross-correlation that captures geometric and semantic relationships between entities. The respective matches allow us to recover the relative transformation between scans through weighted Singular Value Decomposition (SVD) and RANdom SAmple Consensus (RANSAC). We demonstrate that such representation is sufficient for metric localisation by registering point clouds taken under different viewpoints on the KITTI dataset, and at different periods of time localising between KITTI and KITTI-360. We achieve accurate metric estimates comparable with state-of-the-art methods with almost half the representation size, specifically 1.33 kB on average.

ROOct 14, 2024
NAR-*ICP: Neural Execution of Classical ICP-based Pointcloud Registration Algorithms

Efimia Panagiotaki, Daniele De Martini, Lars Kunze et al.

This study explores the intersection of neural networks and classical robotics algorithms through the Neural Algorithmic Reasoning (NAR) blueprint, enabling the training of neural networks to reason like classical robotics algorithms by learning to execute them. Algorithms are integral to robotics and safety-critical applications due to their predictable and consistent performance through logical and mathematical principles. In contrast, while neural networks are highly adaptable, handling complex, high-dimensional data and generalising across tasks, they often lack interpretability and transparency in their internal computations. To bridge the two, we propose a novel Graph Neural Network (GNN)-based framework, NAR-*ICP, that learns the intermediate computations of classical ICP-based registration algorithms, extending the CLRS Benchmark. We evaluate our approach across real-world and synthetic datasets, demonstrating its flexibility in handling complex inputs, and its potential to be used within larger learning pipelines. Our method achieves superior performance compared to the baselines, even surpassing the algorithms it was trained on, further demonstrating its ability to generalise beyond the capabilities of traditional algorithms.

CVFeb 27, 2024
Masked Gamma-SSL: Learning Uncertainty Estimation via Masked Image Modeling

David S. W. Williams, Matthew Gadd, Paul Newman et al.

This work proposes a semantic segmentation network that produces high-quality uncertainty estimates in a single forward pass. We exploit general representations from foundation models and unlabelled datasets through a Masked Image Modeling (MIM) approach, which is robust to augmentation hyper-parameters and simpler than previous techniques. For neural networks used in safety-critical applications, bias in the training data can lead to errors; therefore it is crucial to understand a network's limitations at run time and act accordingly. To this end, we test our proposed method on a number of test domains including the SAX Segmentation benchmark, which includes labelled test data from dense urban, rural and off-road driving domains. The proposed method consistently outperforms uncertainty estimation and Out-of-Distribution (OoD) techniques on this difficult benchmark.

CVOct 13, 2025
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference

Jianhao Yuan, Fabio Pizzati, Francesco Pinto et al.

Intuitive physics understanding in video diffusion models plays an essential role in building general-purpose physically plausible world simulators, yet accurately evaluating such capacity remains a challenging task due to the difficulty in disentangling physics correctness from visual appearance in generation. To the end, we introduce LikePhys, a training-free method that evaluates intuitive physics in video diffusion models by distinguishing physically valid and impossible videos using the denoising objective as an ELBO-based likelihood surrogate on a curated dataset of valid-invalid pairs. By testing on our constructed benchmark of twelve scenarios spanning over four physics domains, we show that our evaluation metric, Plausibility Preference Error (PPE), demonstrates strong alignment with human preference, outperforming state-of-the-art evaluator baselines. We then systematically benchmark intuitive physics understanding in current video diffusion models. Our study further analyses how model design and inference settings affect intuitive physics understanding and highlights domain-specific capacity variations across physical laws. Empirical results show that, despite current models struggling with complex and chaotic dynamics, there is a clear trend of improvement in physics understanding as model capacity and inference settings scale.

CVMar 14, 2024
VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place Recognition

Benjamin Ramtoula, Daniele De Martini, Matthew Gadd et al.

This paper adapts a general dataset representation technique to produce robust Visual Place Recognition (VPR) descriptors, crucial to enable real-world mobile robot localisation. Two parallel lines of work on VPR have shown, on one side, that general-purpose off-the-shelf feature representations can provide robustness to domain shifts, and, on the other, that fused information from sequences of images improves performance. In our recent work on measuring domain gaps between image datasets, we proposed a Visual Distribution of Neuron Activations (VDNA) representation to represent datasets of images. This representation can naturally handle image sequences and provides a general and granular feature representation derived from a general-purpose model. Moreover, our representation is based on tracking neuron activation values over the list of images to represent and is not limited to a particular neural network layer, therefore having access to high- and low-level concepts. This work shows how VDNAs can be used for VPR by learning a very lightweight and simple encoder to generate task-specific descriptors. Our experiments show that our representation can allow for better robustness than current solutions to serious domain shifts away from the training data distribution, such as to indoor environments and aerial imagery.

CVOct 6, 2021
Contrastive Learning for Unsupervised Radar Place Recognition

Matthew Gadd, Daniele De Martini, Paul Newman

We learn, in an unsupervised way, an embedding from sequences of radar images that is suitable for solving the place recognition problem with complex radar data. Our method is based on invariant instance feature learning but is tailored for the task of re-localisation by exploiting for data augmentation the temporal successivity of data as collected by a mobile platform moving through the scene smoothly. We experiment across two prominent urban radar datasets totalling over 400 km of driving and show that we achieve a new radar place recognition state-of-the-art. Specifically, the proposed system proves correct for 98.38% of the queries that it is presented with over a challenging re-localisation sequence, using only the single nearest neighbour in the learned metric space. We also find that our learned model shows better understanding of out-of-lane loop closures at arbitrary orientation than non-learned radar scan descriptors.

CVJun 16, 2021
The Oxford Road Boundaries Dataset

Tarlan Suleymanov, Matthew Gadd, Daniele De Martini et al.

In this paper we present the Oxford Road Boundaries Dataset, designed for training and testing machine-learning-based road-boundary detection and inference approaches. We have hand-annotated two of the 10 km-long forays from the Oxford Robotcar Dataset and generated from other forays several thousand further examples with semi-annotated road-boundary masks. To boost the number of training samples in this way, we used a vision-based localiser to project labels from the annotated datasets to other traversals at different times and weather conditions. As a result, we release 62605 labelled samples, of which 47639 samples are curated. Each of these samples contains both raw and classified masks for left and right lenses. Our data contains images from a diverse set of scenarios such as straight roads, parked cars, junctions, etc. Files for download and tools for manipulating the labelled data are available at: oxford-robotics-institute.github.io/road-boundaries-dataset

CVJun 12, 2021
Unsupervised Place Recognition with Deep Embedding Learning over Radar Videos

Matthew Gadd, Daniele De Martini, Paul Newman

We learn, in an unsupervised way, an embedding from sequences of radar images that is suitable for solving place recognition problem using complex radar data. We experiment on 280 km of data and show performance exceeding state-of-the-art supervised approaches, localising correctly 98.38% of the time when using just the nearest database candidate.

CVMar 1, 2021
Fool Me Once: Robust Selective Segmentation via Out-of-Distribution Detection with Contrastive Learning

David Williams, Matthew Gadd, Daniele De Martini et al.

In this work, we train a network to simultaneously perform segmentation and pixel-wise Out-of-Distribution (OoD) detection, such that the segmentation of unknown regions of scenes can be rejected. This is made possible by leveraging an OoD dataset with a novel contrastive objective and data augmentation scheme. By combining data including unknown classes in the training data, a more robust feature representation can be learned with known classes represented distinctly from those unknown. When presented with unknown classes or conditions, many current approaches for segmentation frequently exhibit high confidence in their inaccurate segmentations and cannot be trusted in many operational environments. We validate our system on a real-world dataset of unusual driving scenes, and show that by selectively segmenting scenes based on what is predicted as OoD, we can increase the segmentation accuracy by an IoU of 0.2 with respect to alternative techniques.

ROJun 3, 2020
Self-Supervised Localisation between Range Sensors and Overhead Imagery

Tim Y. Tang, Daniele De Martini, Shangzhe Wu et al.

Publicly available satellite imagery can be an ubiquitous, cheap, and powerful tool for vehicle localisation when a prior sensor map is unavailable. However, satellite images are not directly comparable to data from ground range sensors because of their starkly different modalities. We present a learned metric localisation method that not only handles the modality difference, but is cheap to train, learning in a self-supervised fashion without metrically accurate ground truth. By evaluating across multiple real-world datasets, we demonstrate the robustness and versatility of our method for various sensor configurations. We pay particular attention to the use of millimetre wave radar, which, owing to its complex interaction with the scene and its immunity to weather and lighting, makes for a compelling and valuable use case.

ROMay 11, 2020
Keep off the Grass: Permissible Driving Routes from Radar with Weak Audio Supervision

David Williams, Daniele De Martini, Matthew Gadd et al.

Reliable outdoor deployment of mobile robots requires the robust identification of permissible driving routes in a given environment. The performance of LiDAR and vision-based perception systems deteriorates significantly if certain environmental factors are present e.g. rain, fog, darkness. Perception systems based on FMCW scanning radar maintain full performance regardless of environmental conditions and with a longer range than alternative sensors. Learning to segment a radar scan based on driveability in a fully supervised manner is not feasible as labelling each radar scan on a bin-by-bin basis is both difficult and time-consuming to do by hand. We therefore weakly supervise the training of the radar-based classifier through an audio-based classifier that is able to predict the terrain type underneath the robot. By combining odometry, GPS and the terrain labels from the audio classifier, we are able to construct a terrain labelled trajectory of the robot in the environment which is then used to label the radar scans. Using a curriculum learning procedure, we then train a radar segmentation network to generalise beyond the initial labelling and to detect all permissible driving routes in the environment.

CYMay 5, 2020
Sense-Assess-eXplain (SAX): Building Trust in Autonomous Vehicles in Challenging Real-World Driving Scenarios

Matthew Gadd, Daniele De Martini, Letizia Marchegiani et al.

This paper discusses ongoing work in demonstrating research in mobile autonomy in challenging driving scenarios. In our approach, we address fundamental technical issues to overcome critical barriers to assurance and regulation for large-scale deployments of autonomous systems. To this end, we present how we build robots that (1) can robustly sense and interpret their environment using traditional as well as unconventional sensors; (2) can assess their own capabilities; and (3), vitally in the purpose of assurance and trust, can provide causal explanations of their interpretations and assessments. As it is essential that robots are safe and trusted, we design, develop, and demonstrate fundamental technologies in real-world applications to overcome critical barriers which impede the current deployment of robots in economically and socially important areas. Finally, we describe ongoing work in the collection of an unusual, rare, and highly valuable dataset.

CVApr 2, 2020
RSS-Net: Weakly-Supervised Multi-Class Semantic Segmentation with FMCW Radar

Prannay Kaul, Daniele De Martini, Matthew Gadd et al.

This paper presents an efficient annotation procedure and an application thereof to end-to-end, rich semantic segmentation of the sensed environment using FMCW scanning radar. We advocate radar over the traditional sensors used for this task as it operates at longer ranges and is substantially more robust to adverse weather and illumination conditions. We avoid laborious manual labelling by exploiting the largest radar-focused urban autonomy dataset collected to date, correlating radar scans with RGB cameras and LiDAR sensors, for which semantic segmentation is an already consolidated procedure. The training procedure leverages a state-of-the-art natural image segmentation system which is publicly available and as such, in contrast to previous approaches, allows for the production of copious labels for the radar stream by incorporating four camera and two LiDAR streams. Additionally, the losses are computed taking into account labels to the radar sensor horizon by accumulating LiDAR returns along a pose-chain ahead and behind of the current vehicle position. Finally, we present the network with multi-channel radar scan inputs in order to deal with ephemeral and dynamic scene objects.

CVMar 10, 2020
Rainy screens: Collecting rainy datasets, indoors

Horia Porav, Valentina-Nicoleta Musat, Tom Bruls et al.

Acquisition of data with adverse conditions in robotics is a cumbersome task due to the difficulty in guaranteeing proper ground truth and synchronising with desired weather conditions. In this paper, we present a simple method - recording a high resolution screen - for generating diverse rainy images from existing clear ground-truth images that is domain- and source-agnostic, simple and scales up. This setup allows us to leverage the diversity of existing datasets with auxiliary task ground-truth data, such as semantic segmentation, object positions etc. We generate rainy images with real adherent droplets and rain streaks based on Cityscapes and BDD, and train a de-raining model. We present quantitative results for image reconstruction and semantic segmentation, and qualitative results for an out-of-sample domain, showing that models trained with our data generalize well.

ROMar 10, 2020
LiDAR Lateral Localisation Despite Challenging Occlusion from Traffic

Tarlan Suleymanov, Matthew Gadd, Lars Kunze et al.

This paper presents a system for improving the robustness of LiDAR lateral localisation systems. This is made possible by including detections of road boundaries which are invisible to the sensor (due to occlusion, e.g. traffic) but can be located by our Occluded Road Boundary Inference Deep Neural Network. We show an example application in which fusion of a camera stream is used to initialise the lateral localisation. We demonstrate over four driven forays through central Oxford - totalling 40 km of driving - a gain in performance that inferring of occluded road boundaries brings.

ROMar 10, 2020
Look Around You: Sequence-based Radar Place Recognition with Learned Rotational Invariance

Matthew Gadd, Daniele De Martini, Paul Newman

This paper details an application which yields significant improvements to the adeptness of place recognition with Frequency-Modulated Continuous-Wave radar - a commercially promising sensor poised for exploitation in mobile autonomy. We show how a rotationally-invariant metric embedding for radar scans can be integrated into sequence-based trajectory matching systems typically applied to videos taken by visual sensors. Due to the complete horizontal field of view inherent to the radar scan formation process, we show how this off-the-shelf sequence-based trajectory matching system can be manipulated to detect place matches when the vehicle is travelling down a previously visited stretch of road in the opposite direction. We demonstrate the efficacy of the approach on 26 km of challenging urban driving taken from the largest radar-focused urban autonomy dataset released to date -- showing a boost of 30% in recall at high levels of precision over a nearest neighbour approach.

ROFeb 24, 2020
Real-time Kinematic Ground Truth for the Oxford RobotCar Dataset

Will Maddern, Geoffrey Pascoe, Matthew Gadd et al.

We describe the release of reference data towards a challenging long-term localisation and mapping benchmark based on the large-scale Oxford RobotCar Dataset. The release includes 72 traversals of a route through Oxford, UK, gathered in all illumination, weather and traffic conditions, and is representative of the conditions an autonomous vehicle would be expected to operate reliably in. Using post-processed raw GPS, IMU, and static GNSS base station recordings, we have produced a globally-consistent centimetre-accurate ground truth for the entire year-long duration of the dataset. Coupled with a planned online benchmarking service, we hope to enable quantitative evaluation and comparison of different localisation and mapping approaches focusing on long-term autonomy for road vehicles in urban environments challenged by changing weather.

ROJan 26, 2020
Kidnapped Radar: Topological Radar Localisation using Rotationally-Invariant Metric Learning

Ştefan Săftescu, Matthew Gadd, Daniele De Martini et al.

This paper presents a system for robust, large-scale topological localisation using Frequency-Modulated Continuous-Wave (FMCW) scanning radar. We learn a metric space for embedding polar radar scans using CNN and NetVLAD architectures traditionally applied to the visual domain. However, we tailor the feature extraction for more suitability to the polar nature of radar scan formation using cylindrical convolutions, anti-aliasing blurring, and azimuth-wise max-pooling; all in order to bolster the rotational invariance. The enforced metric space is then used to encode a reference trajectory, serving as a map, which is queried for nearest neighbours (NNs) for recognition of places at run-time. We demonstrate the performance of our topological localisation system over the course of many repeat forays using the largest radar-focused mobile autonomy dataset released to date, totalling 280 km of urban driving, a small portion of which we also use to learn the weights of the modified architecture. As this work represents a novel application for FMCW radar, we analyse the utility of the proposed method via a comprehensive set of metrics which provide insight into the efficacy when used in a realistic system, showing improved performance over the root architecture even in the face of random rotational perturbation.

CVJan 22, 2020
Learning to Correct 3D Reconstructions from Multiple Views

Ştefan Săftescu, Paul Newman

This paper is about reducing the cost of building good large-scale 3D reconstructions post-hoc. We render 2D views of an existing reconstruction and train a convolutional neural network (CNN) that refines inverse-depth to match a higher-quality reconstruction. Since the views that we correct are rendered from the same reconstruction, they share the same geometry, so overlapping views complement each other. We take advantage of that in two ways. Firstly, we impose a loss during training which guides predictions on neighbouring views to have the same geometry and has been shown to improve performance. Secondly, in contrast to previous work, which corrects each view independently, we also make predictions on sets of neighbouring views jointly. This is achieved by warping feature maps between views and thus bypassing memory-intensive 3D computation. We make the observation that features in the feature maps are viewpoint-dependent, and propose a method for transforming features with dynamic filters generated by a multi-layer perceptron from the relative poses between views. In our experiments we show that this last step is necessary for successfully fusing feature maps between views.

CVJan 9, 2020
RSL-Net: Localising in Satellite Images From a Radar on the Ground

Tim Y. Tang, Daniele De Martini, Dan Barnes et al.

This paper is about localising a vehicle in an overhead image using FMCW radar mounted on a ground vehicle. FMCW radar offers extraordinary promise and efficacy for vehicle localisation. It is impervious to all weather types and lighting conditions. However the complexity of the interactions between millimetre radar wave and the physical environment makes it a challenging domain. Infrastructure-free large-scale radar-based localisation is in its infancy. Typically here a map is built and suitable techniques, compatible with the nature of sensor, are brought to bear. In this work we eschew the need for a radar-based map; instead we simply use an overhead image -- a resource readily available everywhere. This paper introduces a method that not only naturally deals with the complexity of the signal type but does so in the context of cross modal processing.

CVSep 8, 2019
Learning Geometrically Consistent Mesh Corrections

Ştefan Săftescu, Paul Newman

Building good 3D maps is a challenging and expensive task, which requires high-quality sensors and careful, time-consuming scanning. We seek to reduce the cost of building good reconstructions by correcting views of existing low-quality ones in a post-hoc fashion using learnt priors over surfaces and appearance. We train a CNN model to predict the difference in inverse-depth from varying viewpoints of two meshes -- one of low quality that we wish to correct, and one of high-quality that we use as a reference. In contrast to previous work, we pay attention to the problem of excessive smoothing in corrected meshes. We address this with a suitable network architecture, and introduce a loss-weighting mechanism that emphasises edges in the prediction. Furthermore, smooth predictions result in geometrical inconsistencies. To deal with this issue, we present a loss function which penalises re-projection differences that are not due to occlusions. Our model reduces gross errors by 45.3%--77.5%, up to five times more than previous work.

ROSep 3, 2019
The Oxford Radar RobotCar Dataset: A Radar Extension to the Oxford RobotCar Dataset

Dan Barnes, Matthew Gadd, Paul Murcutt et al.

In this paper we present The Oxford Radar RobotCar Dataset, a new dataset for researching scene understanding using Millimetre-Wave FMCW scanning radar data. The target application is autonomous vehicles where this modality is robust to environmental conditions such as fog, rain, snow, or lens flare, which typically challenge other sensor modalities such as vision and LIDAR. The data were gathered in January 2019 over thirty-two traversals of a central Oxford route spanning a total of 280km of urban driving. It encompasses a variety of weather, traffic, and lighting conditions. This 4.7TB dataset consists of over 240,000 scans from a Navtech CTS350-X radar and 2.4 million scans from two Velodyne HDL-32E 3D LIDARs; along with six cameras, two 2D LIDARs, and a GPS/INS receiver. In addition we release ground truth optimised radar odometry to provide an additional impetus to research in this domain. The full dataset is available for download at: ori.ox.ac.uk/datasets/radar-robotcar-dataset

CVJul 25, 2019
Don't Worry About the Weather: Unsupervised Condition-Dependent Domain Adaptation

Horia Porav, Tom Bruls, Paul Newman

Modern models that perform system-critical tasks such as segmentation and localization exhibit good performance and robustness under ideal conditions (i.e. daytime, overcast) but performance degrades quickly and often catastrophically when input conditions change. In this work, we present a domain adaptation system that uses light-weight input adapters to pre-processes input images, irrespective of their appearance, in a way that makes them compatible with off-the-shelf computer vision tasks that are trained only on inputs with ideal conditions. No fine-tuning is performed on the off-the-shelf models, and the system is capable of incrementally training new input adapters in a self-supervised fashion, using the computer vision tasks as supervisors, when the input domain differs significantly from previously seen domains. We report large improvements in semantic segmentation and topological localization performance on two popular datasets, RobotCar and BDD.

ROJul 11, 2019
Online Inference and Detection of Curbs in Partially Occluded Scenes with Sparse LIDAR

Tarlan Suleymanov, Lars Kunze, Paul Newman

Road boundaries, or curbs, provide autonomous vehicles with essential information when interpreting road scenes and generating behaviour plans. Although curbs convey important information, they are difficult to detect in complex urban environments (in particular in comparison to other elements of the road such as traffic signs and road markings). These difficulties arise from occlusions by other traffic participants as well as changing lighting and/or weather conditions. Moreover, road boundaries have various shapes, colours and structures while motion planning algorithms require accurate and precise metric information in real-time to generate their plans. In this paper, we present a real-time LIDAR-based approach for accurate curb detection around the vehicle (360 degree). Our approach deals with both occlusions from traffic and changing environmental conditions. To this end, we project 3D LIDAR pointcloud data into 2D bird's-eye view images (akin to Inverse Perspective Mapping). These images are then processed by trained deep networks to infer both visible and occluded road boundaries. Finally, a post-processing step filters detected curb segments and tracks them over time. Experimental results demonstrate the effectiveness of the proposed approach on real-world driving data. Hence, we believe that our LIDAR-based approach provides an efficient and effective way to detect visible and occluded curbs around the vehicles in challenging driving scenarios.

ROJul 10, 2019
Generating All the Roads to Rome: Road Layout Randomization for Improved Road Marking Segmentation

Tom Bruls, Horia Porav, Lars Kunze et al.

Road markings provide guidance to traffic participants and enforce safe driving behaviour, understanding their semantic meaning is therefore paramount in (automated) driving. However, producing the vast quantities of road marking labels required for training state-of-the-art deep networks is costly, time-consuming, and simply infeasible for every domain and condition. In addition, training data retrieved from virtual worlds often lack the richness and complexity of the real world and consequently cannot be used directly. In this paper, we provide an alternative approach in which new road marking training pairs are automatically generated. To this end, we apply principles of domain randomization to the road layout and synthesize new images from altered semantic labels. We demonstrate that training on these synthetic pairs improves mIoU of the segmentation of rare road marking classes during real-world deployment in complex urban environments by more than 12 percentage points, while performance for other classes is retained. This framework can easily be scaled to all domains and conditions to generate large-scale road marking datasets, while avoiding manual labelling effort.

ROMay 17, 2019
Training Object Detectors With Noisy Data

Simon Chadwick, Paul Newman

The availability of a large quantity of labelled training data is crucial for the training of modern object detectors. Hand labelling training data is time consuming and expensive while automatic labelling methods inevitably add unwanted noise to the labels. We examine the effect of different types of label noise on the performance of an object detector. We then show how co-teaching, a method developed for handling noisy labels and previously demonstrated on a classification problem, can be improved to mitigate the effects of label noise in an object detection setting. We illustrate our results using simulated noise on the KITTI dataset and on a vehicle detection task using automatically labelled data.

ROApr 25, 2019
Radar-only ego-motion estimation in difficult settings via graph matching

Sarah H. Cen, Paul Newman

Radar detects stable, long-range objects under variable weather and lighting conditions, making it a reliable and versatile sensor well suited for ego-motion estimation. In this work, we propose a radar-only odometry pipeline that is highly robust to radar artifacts (e.g., speckle noise and false positives) and requires only one input parameter. We demonstrate its ability to adapt across diverse settings, from urban UK to off-road Iceland, achieving a scan matching accuracy of approximately 5.20 cm and 0.0929 deg when using GPS as ground truth (compared to visual odometry's 5.77 cm and 0.1032 deg). We present algorithms for keypoint extraction and data association, framing the latter as a graph matching optimization problem, and provide an in-depth system analysis.

ROJan 30, 2019
Distant Vehicle Detection Using Radar and Vision

Simon Chadwick, Will Maddern, Paul Newman

For autonomous vehicles to be able to operate successfully they need to be aware of other vehicles with sufficient time to make safe, stable plans. Given the possible closing speeds between two vehicles, this necessitates the ability to accurately detect distant vehicles. Many current image-based object detectors using convolutional neural networks exhibit excellent performance on existing datasets such as KITTI. However, the performance of these networks falls when detecting small (distant) objects. We demonstrate that incorporating radar data can boost performance in these difficult situations. We also introduce an efficient automated method for training data generation using cameras of different focal lengths.

LGJan 3, 2019
Imminent Collision Mitigation with Reinforcement Learning and Vision

Horia Porav, Paul Newman

This work examines the role of reinforcement learning in reducing the severity of on-road collisions by controlling velocity and steering in situations in which contact is imminent. We construct a model, given camera images as input, that is capable of learning and predicting the dynamics of obstacles, cars and pedestrians, and train our policy using this model. Two policies that control both braking and steering are compared against a baseline where the only action taken is (conventional) braking in a straight line. The two policies are trained using two distinct reward structures, one where any and all collisions incur a fixed penalty, and a second one where the penalty is calculated based on already established delta-v models of injury severity. The results show that both policies exceed the performance of the baseline, with the policy trained using injury models having the highest performance.

CVJan 3, 2019
I Can See Clearly Now : Image Restoration via De-Raining

Horia Porav, Tom Bruls, Paul Newman

We present a method for improving segmentation tasks on images affected by adherent rain drops and streaks. We introduce a novel stereo dataset recorded using a system that allows one lens to be affected by real water droplets while keeping the other lens clear. We train a denoising generator using this dataset and show that it is effective at removing the effect of real water droplets, in the context of image reconstruction and road marking segmentation. To further test our de-noising approach, we describe a method of adding computer-generated adherent water droplets and streaks to any images, and use this technique as a proxy to demonstrate the effectiveness of our model in the context of general semantic segmentation. We benchmark our results using the CamVid road marking segmentation dataset, Cityscapes semantic segmentation datasets and our own real-rain dataset, and show significant improvement on all tasks.

CVDec 3, 2018
The Right (Angled) Perspective: Improving the Understanding of Road Scenes Using Boosted Inverse Perspective Mapping

Tom Bruls, Horia Porav, Lars Kunze et al.

Many tasks performed by autonomous vehicles such as road marking detection, object tracking, and path planning are simpler in bird's-eye view. Hence, Inverse Perspective Mapping (IPM) is often applied to remove the perspective effect from a vehicle's front-facing camera and to remap its images into a 2D domain, resulting in a top-down view. Unfortunately, however, this leads to unnatural blurring and stretching of objects at further distance, due to the resolution of the camera, limiting applicability. In this paper, we present an adversarial learning approach for generating a significantly improved IPM from a single camera image in real time. The generated bird's-eye-view images contain sharper features (e.g. road markings) and a more homogeneous illumination, while (dynamic) objects are automatically removed from the scene, thus revealing the underlying road layout in an improved fashion. We demonstrate our framework using real-world data from the Oxford RobotCar Dataset and show that scene understanding tasks directly benefit from our boosted IPM approach.

ROOct 18, 2018
Probably Unknown: Deep Inverse Sensor Modelling In Radar

Rob Weston, Sarah Cen, Paul Newman et al.

Radar presents a promising alternative to lidar and vision in autonomous vehicle applications, able to detect objects at long range under a variety of weather conditions. However, distinguishing between occupied and free space from raw radar power returns is challenging due to complex interactions between sensor noise and occlusion. To counter this we propose to learn an Inverse Sensor Model (ISM) converting a raw radar scan to a grid map of occupancy probabilities using a deep neural network. Our network is self-supervised using partial occupancy labels generated by lidar, allowing a robot to learn about world occupancy from past experience without human supervision. We evaluate our approach on five hours of data recorded in a dynamic urban environment. By accounting for the scene context of each grid cell our model is able to successfully segment the world into occupied and free space, outperforming standard CFAR filtering approaches. Additionally by incorporating heteroscedastic uncertainty into our model formulation, we are able to quantify the variance in the uncertainty throughout the sensor observation. Through this mechanism we are able to successfully identify regions of space that are likely to be occluded.

SDOct 11, 2018
Listening for Sirens: Locating and Classifying Acoustic Alarms in City Scenes

Letizia Marchegiani, Paul Newman

This paper is about alerting acoustic event detection and sound source localisation in an urban scenario. Specifically, we are interested in spotting the presence of horns, and sirens of emergency vehicles. In order to obtain a reliable system able to operate robustly despite the presence of traffic noise, which can be copious, unstructured and unpredictable, we propose to treat the spectrograms of incoming stereo signals as images, and apply semantic segmentation, based on a Unet architecture, to extract the target sound from the background noise. In a multi-task learning scheme, together with signal denoising, we perform acoustic event classification to identify the nature of the alerting sound. Lastly, we use the denoised signals to localise the acoustic source on the horizon plane, by regressing the direction of arrival of the sound through a CNN architecture. Our experimental evaluation shows an average classification rate of 94%, and a median absolute error on the localisation of 7.5° when operating on audio frames of 0.5s, and of 2.5° when operating on frames of 2.5s. The system offers excellent performance in particularly challenging scenarios, where the noise level is remarkably high.

ROAug 1, 2018
Multimotion Visual Odometry (MVO): Simultaneous Estimation of Camera and Third-Party Motions

Kevin M. Judd, Jonathan D. Gammell, Paul Newman

Estimating motion from images is a well-studied problem in computer vision and robotics. Previous work has developed techniques to estimate the motion of a moving camera in a largely static environment (e.g., visual odometry) and to segment or track motions in a dynamic scene using known camera motions (e.g., multiple object tracking). It is more challenging to estimate the unknown motion of the camera and the dynamic scene simultaneously. Most previous work requires a priori object models (e.g., tracking-by-detection), motion constraints (e.g., planar motion), or fails to estimate the full SE(3) motions of the scene (e.g., scene flow). While these approaches work well in specific application domains, they are not generalizable to unconstrained motions. This paper extends the traditional visual odometry (VO) pipeline to estimate the full SE(3) motion of both a stereo/RGB-D camera and the dynamic scene. This multimotion visual odometry (MVO) pipeline requires no a priori knowledge of the environment or the dynamic objects. Its performance is evaluated on a real-world dynamic dataset with ground truth for all motions from a motion capture system.

CVMar 9, 2018
Adversarial Training for Adverse Conditions: Robust Metric Localisation using Appearance Transfer

Horia Porav, Will Maddern, Paul Newman

We present a method of improving visual place recognition and metric localisation under very strong appear- ance change. We learn an invertable generator that can trans- form the conditions of images, e.g. from day to night, summer to winter etc. This image transforming filter is explicitly designed to aid and abet feature-matching using a new loss based on SURF detector and dense descriptor maps. A network is trained to output synthetic images optimised for feature matching given only an input RGB image, and these generated images are used to localize the robot against a previously built map using traditional sparse matching approaches. We benchmark our results using multiple traversals of the Oxford RobotCar Dataset over a year-long period, using one traversal as a map and the other to localise. We show that this method significantly improves place recognition and localisation under changing and adverse conditions, while reducing the number of mapping runs needed to successfully achieve reliable localisation.

ROFeb 23, 2018
Surface Edge Explorer (SEE): Planning Next Best Views Directly from 3D Observations

Rowan Border, Jonathan D. Gammell, Paul Newman

Surveying 3D scenes is a common task in robotics. Systems can do so autonomously by iteratively obtaining measurements. This process of planning observations to improve the model of a scene is called Next Best View (NBV) planning. NBV planning approaches often use either volumetric (e.g., voxel grids) or surface (e.g., triangulated meshes) representations. Volumetric approaches generalise well between scenes as they do not depend on surface geometry but do not scale to high-resolution models of large scenes. Surface representations can obtain high-resolution models at any scale but often require tuning of unintuitive parameters or multiple survey stages. This paper presents a scene-model-free NBV planning approach with a density representation. The Surface Edge Explorer (SEE) uses the density of current measurements to detect and explore observed surface boundaries. This approach is shown experimentally to provide better surface coverage in lower computation time than the evaluated state-of-the-art volumetric approaches while moving equivalent distances.

CVJan 27, 2018
Meshed Up: Learnt Error Correction in 3D Reconstructions

Michael Tanner, Stefan Saftescu, Alex Bewley et al.

Dense reconstructions often contain errors that prior work has so far minimised using high quality sensors and regularising the output. Nevertheless, errors still persist. This paper proposes a machine learning technique to identify errors in three dimensional (3D) meshes. Beyond simply identifying errors, our method quantifies both the magnitude and the direction of depth estimate errors when viewing the scene. This enables us to improve the reconstruction accuracy. We train a suitably deep network architecture with two 3D meshes: a high-quality laser reconstruction, and a lower quality stereo image reconstruction. The network predicts the amount of error in the lower quality reconstruction with respect to the high-quality one, having only view the former through its input. We evaluate our approach by correcting two-dimensional (2D) inverse-depth images extracted from the 3D model, and show that our method improves the quality of these depth reconstructions by up to a relative 10% RMSE.

ROJan 17, 2018
The Data Market: Policies for Decentralised Visual Localisation

Matthew Gadd, Paul Newman

This paper presents a mercantile framework for the decentralised sharing of navigation expertise amongst a fleet of robots which perform regular missions into a common but variable environment. We build on our earlier work and allow individual agents to intermittently initiate trades based on a real-time assessment of the nature of their missions or demand for localisation capability, and to choose trading partners with discrimination based on an internally evolving set of beliefs in the expected value of trading with each other member of the team. To this end, we suggest some obligatory properties that a formalisation of the distributed versioning of experience maps should exhibit, to ensure the eventual convergence in the state of each agent's map under a sequence of pairwise exchanges, as well as the uninterrupted integrity of the representation under versioning operations. To mitigate limitations in hardware and network resources, the "data market" is catalogued by distinct sections of the world, which the agents treat as "products" for appraisal and purchase. To this end, we demonstrate and evaluate our system using the publicly available Oxford RobotCar Dataset, the hand-labelled data market catalogue (approaching 446km of fully indexed sections-of-interest) for which we plan to release alongside the existing raw stereo imagery. We show that, by refining market policies over time, agents achieve improved localisation in a directed and accelerated manner.

CVJun 5, 2017
Geometric Multi-Model Fitting with a Convex Relaxation Algorithm

Paul Amayo, Pedro Pinies, Lina M. Paz et al.

We propose a novel method to fit and segment multi-structural data via convex relaxation. Unlike greedy methods --which maximise the number of inliers-- this approach efficiently searches for a soft assignment of points to models by minimising the energy of the overall classification. Our approach is similar to state-of-the-art energy minimisation techniques which use a global energy. However, we deal with the scaling factor (as the number of models increases) of the original combinatorial problem by relaxing the solution. This relaxation brings two advantages: first, by operating in the continuous domain we can parallelize the calculations. Second, it allows for the use of different metrics which results in a more general formulation. We demonstrate the versatility of our technique on two different problems of estimating structure from images: plane extraction from RGB-D data and homography estimation from pairs of images. In both cases, we report accurate results on publicly available datasets, in most of the cases outperforming the state-of-the-art.

ROSep 16, 2016
Resource-Performance Trade-off Analysis for Mobile Robot Design

Morteza Lahijanian, Maria Svorenova, Akshay A. Morye et al.

The design of mobile autonomous robots is challenging due to the limited on-board resources such as processing power and energy. A promising approach is to generate intelligent schedules that reduce the resource consumption while maintaining best performance, or more interestingly, to trade off reduced resource consumption for a slightly lower but still acceptable level of performance. In this paper, we provide a framework to aid designers in exploring such resource-performance trade-offs and finding schedules for mobile robots, guided by questions such as "what is the minimum resource budget required to achieve a given level of performance?" The framework is based on a quantitative multi-objective verification technique which, for a collection of possibly conflicting objectives, produces the Pareto front that contains all the optimal trade-offs that are achievable. The designer then selects a specific Pareto point based on the resource constraints and desired performance level, and a correct-by-construction schedule that meets those constraints is automatically generated. We demonstrate the efficacy of this framework on several robotic scenarios in both simulations and experiments with encouraging results.

CVApr 13, 2016
DENSER Cities: A System for Dense Efficient Reconstructions of Cities

Michael Tanner, Pedro Pinies, Lina Maria Paz et al.

This paper is about the efficient generation of dense, colored models of city-scale environments from range data and in particular, stereo cameras. Better maps make for better understanding; better understanding leads to better robots, but this comes at a cost. The computational and memory requirements of large dense models can be prohibitive. We provide the theory and the system needed to create city-scale dense reconstructions. To do so, we apply a regularizer over a compressed 3D data structure while dealing with the complex boundary conditions this induces during the data-fusion stage. We show that only with these considerations can we swiftly create neat, large, "well behaved" reconstructions. We evaluate our system using the KITTI dataset and provide statistics for the metric errors in all surfaces created compared to those measured with 3D laser. Our regularizer reduces the median error by 40% in 3.4 km of dense reconstructions with a median accuracy of 6 cm. For subjective analysis, we provide a qualitative review of 6.1 km of our dense reconstructions in an attached video. These are the largest dense reconstructions from a single passive camera we are aware of in the literature.