Mehrdad Dianati

CV
h-index44
22papers
1,010citations
Novelty45%
AI Score32

22 Papers

CVNov 14, 2022Code
Robust Collaborative 3D Object Detection in Presence of Pose Errors

Yifan Lu, Quanhao Li, Baoan Liu et al.

Collaborative 3D object detection exploits information exchange among multiple agents to enhance accuracy of object detection in presence of sensor impairments such as occlusion. However, in practice, pose estimation errors due to imperfect localization would cause spatial message misalignment and significantly reduce the performance of collaboration. To alleviate adverse impacts of pose errors, we propose CoAlign, a novel hybrid collaboration framework that is robust to unknown pose errors. The proposed solution relies on a novel agent-object pose graph modeling to enhance pose consistency among collaborating agents. Furthermore, we adopt a multi-scale data fusion strategy to aggregate intermediate features at multiple spatial resolutions. Comparing with previous works, which require ground-truth pose for training supervision, our proposed CoAlign is more practical since it doesn't require any ground-truth pose supervision in the training and makes no specific assumptions on pose errors. Extensive evaluation of the proposed method is carried out on multiple datasets, certifying that CoAlign significantly reduce relative localization error and achieving the state of art detection performance when pose errors exist. Code are made available for the use of the research community at https://github.com/yifanlu0227/CoAlign.

LGSep 19, 2023Code
A Novel Deep Neural Network for Trajectory Prediction in Automated Vehicles Using Velocity Vector Field

MReza Alipour Sormoli, Amir Samadi, Sajjad Mozaffari et al.

Anticipating the motion of other road users is crucial for automated driving systems (ADS), as it enables safe and informed downstream decision-making and motion planning. Unfortunately, contemporary learning-based approaches for motion prediction exhibit significant performance degradation as the prediction horizon increases or the observation window decreases. This paper proposes a novel technique for trajectory prediction that combines a data-driven learning-based method with a velocity vector field (VVF) generated from a nature-inspired concept, i.e., fluid flow dynamics. In this work, the vector field is incorporated as an additional input to a convolutional-recurrent deep neural network to help predict the most likely future trajectories given a sequence of bird's eye view scene representations. The performance of the proposed model is compared with state-of-the-art methods on the HighD dataset demonstrating that the VVF inclusion improves the prediction accuracy for both short and long-term (5~sec) time horizons. It is also shown that the accuracy remains consistent with decreasing observation windows which alleviates the requirement of a long history of past observations for accurate trajectory prediction. Source codes are available at: https://github.com/Amir-Samadi/VVF-TP.

ROApr 24, 2023
Interruption-Aware Cooperative Perception for V2X Communication-Aided Autonomous Driving

Shunli Ren, Zixing Lei, Zi Wang et al.

Cooperative perception can significantly improve the perception performance of autonomous vehicles beyond the limited perception ability of individual vehicles by exchanging information with neighbor agents through V2X communication. However, most existing work assume ideal communication among agents, ignoring the significant and common \textit{interruption issues} caused by imperfect V2X communication, where cooperation agents can not receive cooperative messages successfully and thus fail to achieve cooperative perception, leading to safety risks. To fully reap the benefits of cooperative perception in practice, we propose V2X communication INterruption-aware COoperative Perception (V2X-INCOP), a cooperative perception system robust to communication interruption for V2X communication-aided autonomous driving, which leverages historical cooperation information to recover missing information due to the interruptions and alleviate the impact of the interruption issue. To achieve comprehensive recovery, we design a communication-adaptive multi-scale spatial-temporal prediction model to extract multi-scale spatial-temporal features based on V2X communication conditions and capture the most significant information for the prediction of the missing information. To further improve recovery performance, we adopt a knowledge distillation framework to give explicit and direct supervision to the prediction model and a curriculum learning strategy to stabilize the training of the model. Experiments on three public cooperative perception datasets demonstrate that the proposed method is effective in alleviating the impacts of communication interruption on cooperative perception.

LGMar 28, 2023
Multimodal Manoeuvre and Trajectory Prediction for Automated Driving on Highways Using Transformer Networks

Sajjad Mozaffari, Mreza Alipour Sormoli, Konstantinos Koufos et al.

Predicting the behaviour (i.e., manoeuvre/trajectory) of other road users, including vehicles, is critical for the safe and efficient operation of autonomous vehicles (AVs), a.k.a., automated driving systems (ADSs). Due to the uncertain future behaviour of vehicles, multiple future behaviour modes are often plausible for a vehicle in a given driving scene. Therefore, multimodal prediction can provide richer information than single-mode prediction, enabling AVs to perform a better risk assessment. To this end, we propose a novel multimodal prediction framework that can predict multiple plausible behaviour modes and their likelihoods. The proposed framework includes a bespoke problem formulation for manoeuvre prediction, a novel transformer-based prediction model, and a tailored training method for multimodal manoeuvre and trajectory prediction. The performance of the framework is evaluated using three public highway driving datasets, namely NGSIM, highD, and exiD. The results show that our framework outperforms the state-of-the-art multimodal methods in terms of prediction error and is capable of predicting plausible manoeuvre and trajectory modes.

ROJun 8, 2023
Trajectory Prediction with Observations of Variable-Length for Motion Planning in Highway Merging scenarios

Sajjad Mozaffari, Mreza Alipour Sormoli, Konstantinos Koufos et al.

Accurate trajectory prediction of nearby vehicles is crucial for the safe motion planning of automated vehicles in dynamic driving scenarios such as highway merging. Existing methods cannot initiate prediction for a vehicle unless observed for a fixed duration of two or more seconds. This prevents a fast reaction by the ego vehicle to vehicles that enter its perception range, thus creating safety concerns. Therefore, this paper proposes a novel transformer-based trajectory prediction approach, specifically trained to handle any observation length larger than one frame. We perform a comprehensive evaluation of the proposed method using two large-scale highway trajectory datasets, namely the highD and exiD. In addition, we study the impact of the proposed prediction approach on motion planning and control tasks using extensive merging scenarios from the exiD dataset. To the best of our knowledge, this marks the first instance where such a large-scale highway merging dataset has been employed for this purpose. The results demonstrate that the prediction model achieves state-of-the-art performance on highD dataset and maintains lower prediction error w.r.t. the constant velocity across all observation lengths in exiD. Moreover, it significantly enhances safety, comfort, and efficiency in dense traffic scenarios, as compared to the constant velocity model.

ROSep 5, 2022
Prediction Based Decision Making for Autonomous Highway Driving

Mustafa Yildirim, Sajjad Mozaffari, Luc McCutcheon et al.

Autonomous driving decision-making is a challenging task due to the inherent complexity and uncertainty in traffic. For example, adjacent vehicles may change their lane or overtake at any time to pass a slow vehicle or to help traffic flow. Anticipating the intention of surrounding vehicles, estimating their future states and integrating them into the decision-making process of an automated vehicle can enhance the reliability of autonomous driving in complex driving scenarios. This paper proposes a Prediction-based Deep Reinforcement Learning (PDRL) decision-making model that considers the manoeuvre intentions of surrounding vehicles in the decision-making process for highway driving. The model is trained using real traffic data and tested in various traffic conditions through a simulation platform. The results show that the proposed PDRL model improves the decision-making performance compared to a Deep Reinforcement Learning (DRL) model by decreasing collision numbers, resulting in safer driving.

LGJul 28, 2023
SAFE: Saliency-Aware Counterfactual Explanations for DNN-based Automated Driving Systems

Amir Samadi, Amir Shirian, Konstantinos Koufos et al.

A CF explainer identifies the minimum modifications in the input that would alter the model's output to its complement. In other words, a CF explainer computes the minimum modifications required to cross the model's decision boundary. Current deep generative CF models often work with user-selected features rather than focusing on the discriminative features of the black-box model. Consequently, such CF examples may not necessarily lie near the decision boundary, thereby contradicting the definition of CFs. To address this issue, we propose in this paper a novel approach that leverages saliency maps to generate more informative CF explanations. Source codes are available at: https://github.com/Amir-Samadi//Saliency_Aware_CF.

CVSep 26, 2024
Good Data Is All Imitation Learning Needs

Amir Samadi, Konstantinos Koufos, Kurt Debattista et al.

In this paper, we address the limitations of traditional teacher-student models, imitation learning, and behaviour cloning in the context of Autonomous/Automated Driving Systems (ADS), where these methods often struggle with incomplete coverage of real-world scenarios. To enhance the robustness of such models, we introduce the use of Counterfactual Explanations (CFEs) as a novel data augmentation technique for end-to-end ADS. CFEs, by generating training samples near decision boundaries through minimal input modifications, lead to a more comprehensive representation of expert driver strategies, particularly in safety-critical scenarios. This approach can therefore help improve the model's ability to handle rare and challenging driving events, such as anticipating darting out pedestrians, ultimately leading to safer and more trustworthy decision-making for ADS. Our experiments in the CARLA simulator demonstrate that CF-Driver outperforms the current state-of-the-art method, achieving a higher driving score and lower infraction rates. Specifically, CF-Driver attains a driving score of 84.2, surpassing the previous best model by 15.02 percentage points. These results highlight the effectiveness of incorporating CFEs in training end-to-end ADS. To foster further research, the CF-Driver code is made publicly available.

LGApr 28, 2024Code
SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies

Amir Samadi, Konstantinos Koufos, Kurt Debattista et al.

While Deep Reinforcement Learning (DRL) has emerged as a promising solution for intricate control tasks, the lack of explainability of the learned policies impedes its uptake in safety-critical applications, such as automated driving systems (ADS). Counterfactual (CF) explanations have recently gained prominence for their ability to interpret black-box Deep Learning (DL) models. CF examples are associated with minimal changes in the input, resulting in a complementary output by the DL model. Finding such alternations, particularly for high-dimensional visual inputs, poses significant challenges. Besides, the temporal dependency introduced by the reliance of the DRL agent action on a history of past state observations further complicates the generation of CF examples. To address these challenges, we propose using a saliency map to identify the most influential input pixels across the sequence of past observed states by the agent. Then, we feed this map to a deep generative model, enabling the generation of plausible CFs with constrained modifications centred on the salient regions. We evaluate the effectiveness of our framework in diverse domains, including ADS, Atari Pong, Pacman and space-invaders games, using traditional performance metrics such as validity, proximity and sparsity. Experimental results demonstrate that this framework generates more informative and plausible CFs than the state-of-the-art for a wide range of environments and DRL agents. In order to foster research in this area, we have made our datasets and codes publicly available at https://github.com/Amir-Samadi/SAFE-RL.

CVApr 8, 2024Code
Taming Transformers for Realistic Lidar Point Cloud Generation

Hamed Haghighi, Amir Samadi, Mehrdad Dianati et al.

Diffusion Models (DMs) have achieved State-Of-The-Art (SOTA) results in the Lidar point cloud generation task, benefiting from their stable training and iterative refinement during sampling. However, DMs often fail to realistically model Lidar raydrop noise due to their inherent denoising process. To retain the strength of iterative sampling while enhancing the generation of raydrop noise, we introduce LidarGRIT, a generative model that uses auto-regressive transformers to iteratively sample the range images in the latent space rather than image space. Furthermore, LidarGRIT utilises VQ-VAE to separately decode range images and raydrop masks. Our results show that LidarGRIT achieves superior performance compared to SOTA models on KITTI-360 and KITTI odometry datasets. Code available at:https://github.com/hamedhaghighi/LidarGRIT.

CVDec 25, 2023Code
A Unified Generative Framework for Realistic Lidar Simulation in Autonomous Driving Systems

Hamed Haghighi, Mehrdad Dianati, Valentina Donzella et al.

Simulation models for perception sensors are integral components of automotive simulators used for the virtual Verification and Validation (V\&V) of Autonomous Driving Systems (ADS). These models also serve as powerful tools for generating synthetic datasets to train deep learning-based perception models. Lidar is a widely used sensor type among the perception sensors for ADS due to its high precision in 3D environment scanning. However, developing realistic Lidar simulation models is a significant technical challenge. In particular, unrealistic models can result in a large gap between the synthesised and real-world point clouds, limiting their effectiveness in ADS applications. Recently, deep generative models have emerged as promising solutions to synthesise realistic sensory data. However, for Lidar simulation, deep generative models have been primarily hybridised with conventional algorithms, leaving unified generative approaches largely unexplored in the literature. Motivated by this research gap, we propose a unified generative framework to enhance Lidar simulation fidelity. Our proposed framework projects Lidar point clouds into depth-reflectance images via a lossless transformation, and employs our novel Controllable Lidar point cloud Generative model, CoLiGen, to translate the images. We extensively evaluate our CoLiGen model, comparing it with the state-of-the-art image-to-image translation models using various metrics to assess the realness, faithfulness, and performance of a downstream perception model. Our results show that CoLiGen exhibits superior performance across most metrics. The dataset and source code for this research are available at https://github.com/hamedhaghighi/CoLiGen.git.

LGMay 25, 2023Code
Counterfactual Explainer Framework for Deep Reinforcement Learning Models Using Policy Distillation

Amir Samadi, Konstantinos Koufos, Kurt Debattista et al.

Deep Reinforcement Learning (DRL) has demonstrated promising capability in solving complex control problems. However, DRL applications in safety-critical systems are hindered by the inherent lack of robust verification techniques to assure their performance in such applications. One of the key requirements of the verification process is the development of effective techniques to explain the system functionality, i.e., why the system produces specific results in given circumstances. Recently, interpretation methods based on the Counterfactual (CF) explanation approach have been proposed to address the problem of explanation in DRLs. This paper proposes a novel CF explanation framework to explain the decisions made by a black-box DRL. To evaluate the efficacy of the proposed explanation framework, we carried out several experiments in the domains of automated driving systems and Atari Pong game. Our analysis demonstrates that the proposed framework generates plausible and meaningful explanations for various decisions made by deep underlying DRLs. Source codes are available at: \url{https://github.com/Amir-Samadi/Counterfactual-Explanation}

CVDec 18, 2021Code
Fast and Robust Registration of Partially Overlapping Point Clouds

Eduardo Arnold, Sajjad Mozaffari, Mehrdad Dianati

Real-time registration of partially overlapping point clouds has emerging applications in cooperative perception for autonomous vehicles and multi-agent SLAM. The relative translation between point clouds in these applications is higher than in traditional SLAM and odometry applications, which challenges the identification of correspondences and a successful registration. In this paper, we propose a novel registration method for partially overlapping point clouds where correspondences are learned using an efficient point-wise feature encoder, and refined using a graph-based attention network. This attention network exploits geometrical relationships between key points to improve the matching in point clouds with low overlap. At inference time, the relative pose transformation is obtained by robustly fitting the correspondences through sample consensus. The evaluation is performed on the KITTI dataset and a novel synthetic dataset including low-overlapping point clouds with displacements of up to 30m. The proposed method achieves on-par performance with state-of-the-art methods on the KITTI dataset, and outperforms existing methods for low overlapping point clouds. Additionally, the proposed method achieves significantly faster inference times, as low as 410ms, between 5 and 35 times faster than competing methods. Our code and dataset are available at https://github.com/eduardohenriquearnold/fastreg.

CVMar 2, 2024
Run-time Introspection of 2D Object Detection in Automated Driving Systems Using Learning Representations

Hakan Yekta Yatbaz, Mehrdad Dianati, Konstantinos Koufos et al.

Reliable detection of various objects and road users in the surrounding environment is crucial for the safe operation of automated driving systems (ADS). Despite recent progresses in developing highly accurate object detectors based on Deep Neural Networks (DNNs), they still remain prone to detection errors, which can lead to fatal consequences in safety-critical applications such as ADS. An effective remedy to this problem is to equip the system with run-time monitoring, named as introspection in the context of autonomous systems. Motivated by this, we introduce a novel introspection solution, which operates at the frame level for DNN-based 2D object detection and leverages neural network activation patterns. The proposed approach pre-processes the neural activation patterns of the object detector's backbone using several different modes. To provide extensive comparative analysis and fair comparison, we also adapt and implement several state-of-the-art (SOTA) introspection mechanisms for error detection in 2D object detection, using one-stage and two-stage object detectors evaluated on KITTI and BDD datasets. We compare the performance of the proposed solution in terms of error detection, adaptability to dataset shift, and, computational and memory resource requirements. Our performance evaluation shows that the proposed introspection solution outperforms SOTA methods, achieving an absolute reduction in the missed error ratio of 9% to 17% in the BDD dataset.

CVApr 11, 2024
Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns

Hakan Yekta Yatbaz, Mehrdad Dianati, Konstantinos Koufos et al.

Monitoring the integrity of object detection for errors within the perception module of automated driving systems (ADS) is paramount for ensuring safety. Despite recent advancements in deep neural network (DNN)-based object detectors, their susceptibility to detection errors, particularly in the less-explored realm of 3D object detection, remains a significant concern. State-of-the-art integrity monitoring (also known as introspection) mechanisms in 2D object detection mainly utilise the activation patterns in the final layer of the DNN-based detector's backbone. However, that may not sufficiently address the complexities and sparsity of data in 3D object detection. To this end, we conduct, in this article, an extensive investigation into the effects of activation patterns extracted from various layers of the backbone network for introspecting the operation of 3D object detectors. Through a comparative analysis using Kitti and NuScenes datasets with PointPillars and CenterPoint detectors, we demonstrate that using earlier layers' activation patterns enhances the error detection performance of the integrity monitoring system, yet increases computational complexity. To address the real-time operation requirements in ADS, we also introduce a novel introspection method that combines activation patterns from multiple layers of the detector's backbone and report its performance.

CVFeb 23, 2024
Benchmarking the Robustness of Panoptic Segmentation for Automated Driving

Yiting Wang, Haonan Zhao, Daniel Gummadi et al.

Precise situational awareness is required for the safe decision-making of assisted and automated driving (AAD) functions. Panoptic segmentation is a promising perception technique to identify and categorise objects, impending hazards, and driveable space at a pixel level. While segmentation quality is generally associated with the quality of the camera data, a comprehensive understanding and modelling of this relationship are paramount for AAD system designers. Motivated by such a need, this work proposes a unifying pipeline to assess the robustness of panoptic segmentation models for AAD, correlating it with traditional image quality. The first step of the proposed pipeline involves generating degraded camera data that reflects real-world noise factors. To this end, 19 noise factors have been identified and implemented with 3 severity levels. Of these factors, this work proposes novel models for unfavourable light and snow. After applying the degradation models, three state-of-the-art CNN- and vision transformers (ViT)-based panoptic segmentation networks are used to analyse their robustness. The variations of the segmentation performance are then correlated to 8 selected image quality metrics. This research reveals that: 1) certain specific noise factors produce the highest impact on panoptic segmentation, i.e. droplets on lens and Gaussian noise; 2) the ViT-based panoptic segmentation backbones show better robustness to the considered noise factors; 3) some image quality metrics (i.e. LPIPS and CW-SSIM) correlate strongly with panoptic segmentation performance and therefore they can be used as predictive metrics for network performance.

CVMay 13, 2024
Integrity Monitoring of 3D Object Detection in Automated Driving Systems using Raw Activation Patterns and Spatial Filtering

Hakan Yekta Yatbaz, Mehrdad Dianati, Konstantinos Koufos et al.

The deep neural network (DNN) models are widely used for object detection in automated driving systems (ADS). Yet, such models are prone to errors which can have serious safety implications. Introspection and self-assessment models that aim to detect such errors are therefore of paramount importance for the safe deployment of ADS. Current research on this topic has focused on techniques to monitor the integrity of the perception mechanism in ADS. Existing introspection models in the literature, however, largely concentrate on detecting perception errors by assigning equal importance to all parts of the input data frame to the perception module. This generic approach overlooks the varying safety significance of different objects within a scene, which obscures the recognition of safety-critical errors, posing challenges in assessing the reliability of perception in specific, crucial instances. Motivated by this shortcoming of state of the art, this paper proposes a novel method integrating raw activation patterns of the underlying DNNs, employed by the perception module, analysis with spatial filtering techniques. This novel approach enhances the accuracy of runtime introspection of the DNN-based 3D object detections by selectively focusing on an area of interest in the data, thereby contributing to the safety and efficacy of ADS perception self-assessment processes.

CVJan 29, 2024
Data-driven Camera and Lidar Simulation Models for Autonomous Driving: A Review from Generative Models to Volume Renderers

Hamed Haghighi, Xiaomeng Wang, Hao Jing et al.

Perception sensors, particularly camera and Lidar, are key elements of Autonomous Driving Systems (ADS) that enable them to comprehend their surroundings to informed driving and control decisions. Therefore, developing realistic simulation models for these sensors is essential for conducting effective simulation-based testing of ADS. Moreover, the rise of deep learning-based perception models has increased the utility of sensor simulation models for synthesising diverse training datasets. The traditional sensor simulation models rely on computationally expensive physics-based algorithms, specifically in complex systems such as ADS. Hence, the current potential resides in data-driven approaches, fuelled by the exceptional performance of deep generative models in capturing high-dimensional data distribution and volume renderers in accurately representing scenes. This paper reviews the current state-of-the-art data-driven camera and Lidar simulation models and their evaluation methods. It explores a spectrum of models from the novel perspective of generative models and volume renderers. Generative models are discussed in terms of their input-output types, while volume renderers are categorised based on their input encoding. Finally, the paper illustrates commonly used evaluation techniques for assessing sensor simulation models and highlights the existing research gaps in the area.

CVSep 22, 2021
Early Lane Change Prediction for Automated Driving Systems Using Multi-Task Attention-based Convolutional Neural Networks

Sajjad Mozaffari, Eduardo Arnold, Mehrdad Dianati et al.

Lane change (LC) is one of the safety-critical manoeuvres in highway driving according to various road accident records. Thus, reliably predicting such manoeuvre in advance is critical for the safe and comfortable operation of automated driving systems. The majority of previous studies rely on detecting a manoeuvre that has been already started, rather than predicting the manoeuvre in advance. Furthermore, most of the previous works do not estimate the key timings of the manoeuvre (e.g., crossing time), which can actually yield more useful information for the decision making in the ego vehicle. To address these shortcomings, this paper proposes a novel multi-task model to simultaneously estimate the likelihood of LC manoeuvres and the time-to-lane-change (TTLC). In both tasks, an attention-based convolutional neural network (CNN) is used as a shared feature extractor from a bird's eye view representation of the driving environment. The spatial attention used in the CNN model improves the feature extraction process by focusing on the most relevant areas of the surrounding environment. In addition, two novel curriculum learning schemes are employed to train the proposed approach. The extensive evaluation and comparative analysis of the proposed method in existing benchmark datasets show that the proposed method outperforms state-of-the-art LC prediction models, particularly considering long-term prediction performance.

CVJun 9, 2021
Visual Sensor Pose Optimisation Using Visibility Models for Smart Cities

Eduardo Arnold, Sajjad Mozaffari, Mehrdad Dianati et al.

Visual sensor networks are used for monitoring traffic in large cities and are promised to support automated driving in complex road segments. The pose of these sensors, i.e. position and orientation, directly determines the coverage of the driving environment, and the ability to detect and track objects navigating therein. Existing sensor pose optimisation methods either maximise the coverage of ground surfaces, or consider the visibility of target objects (e.g. cars) as binary variables, which fails to represent their degree of visibility. For example, such formulations fail in cluttered environments where multiple objects occlude each other. This paper proposes two novel sensor pose optimisation methods, one based on gradient-ascent and one using integer programming techniques, which maximise the visibility of multiple target objects. Both methods are based on a rendering engine that provides pixel-level visibility information about the target objects, and thus, can cope with occlusions in cluttered environments. The methods are evaluated in a complex driving environment and show improved visibility of target objects when compared to existing methods. Such methods can be used to guide the cost effective deployment of sensor networks in smart cities to improve the safety and efficiency of traffic monitoring systems.

CVDec 25, 2019
Deep Learning-based Vehicle Behaviour Prediction For Autonomous Driving Applications: A Review

Sajjad Mozaffari, Omar Y. Al-Jarrah, Mehrdad Dianati et al.

Behaviour prediction function of an autonomous vehicle predicts the future states of the nearby vehicles based on the current and past observations of the surrounding environment. This helps enhance their awareness of the imminent hazards. However, conventional behaviour prediction solutions are applicable in simple driving scenarios that require short prediction horizons. Most recently, deep learning-based approaches have become popular due to their superior performance in more complex environments compared to the conventional approaches. Motivated by this increased popularity, we provide a comprehensive review of the state-of-the-art of deep learning-based approaches for vehicle behaviour prediction in this paper. We firstly give an overview of the generic problem of vehicle behaviour prediction and discuss its challenges, followed by classification and review of the most recent deep learning-based solutions based on three criteria: input representation, output type, and prediction method. The paper also discusses the performance of several well-known solutions, identifies the research gaps in the literature and outlines potential new research directions.

CVDec 18, 2019
Cooperative Perception for 3D Object Detection in Driving Scenarios using Infrastructure Sensors

Eduardo Arnold, Mehrdad Dianati, Robert de Temple et al.

3D object detection is a common function within the perception system of an autonomous vehicle and outputs a list of 3D bounding boxes around objects of interest. Various 3D object detection methods have relied on fusion of different sensor modalities to overcome limitations of individual sensors. However, occlusion, limited field-of-view and low-point density of the sensor data cannot be reliably and cost-effectively addressed by multi-modal sensing from a single point of view. Alternatively, cooperative perception incorporates information from spatially diverse sensors distributed around the environment as a way to mitigate these limitations. This article proposes two schemes for cooperative 3D object detection using single modality sensors. The early fusion scheme combines point clouds from multiple spatially diverse sensing points of view before detection. In contrast, the late fusion scheme fuses the independently detected bounding boxes from multiple spatially diverse sensors. We evaluate the performance of both schemes, and their hybrid combination, using a synthetic cooperative dataset created in two complex driving scenarios, a T-junction and a roundabout. The evaluation shows that the early fusion approach outperforms late fusion by a significant margin at the cost of higher communication bandwidth. The results demonstrate that cooperative perception can recall more than 95% of the objects as opposed to 30% for single-point sensing in the most challenging scenario. To provide practical insights into the deployment of such system, we report how the number of sensors and their configuration impact the detection performance of the system.