ROMar 30, 2023
Milestones in Autonomous Driving and Intelligent Vehicles: Survey of SurveysLong Chen, Yuchen Li, Chao Huang et al.
Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks, lack of systematic summary and research directions in the future. Here we propose a Survey of Surveys (SoS) for total technologies of AD and IVs that reviews the history, summarizes the milestones, and provides the perspectives, ethics, and future research directions. To our knowledge, this article is the first SoS with milestones in AD and IVs, which constitutes our complete research work together with two other technical surveys. We anticipate that this article will bring novel and diverse insights to researchers and abecedarians, and serve as a bridge between past and future.
ROJun 3, 2023
Milestones in Autonomous Driving and Intelligent Vehicles Part II: Perception and PlanningLong Chen, Siyu Teng, Bai Li et al.
Growing interest in autonomous driving (AD) and intelligent vehicles (IVs) is fueled by their promise for enhanced safety, efficiency, and economic benefits. While previous surveys have captured progress in this field, a comprehensive and forward-looking summary is needed. Our work fills this gap through three distinct articles. The first part, a "Survey of Surveys" (SoS), outlines the history, surveys, ethics, and future directions of AD and IV technologies. The second part, "Milestones in Autonomous Driving and Intelligent Vehicles Part I: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors" delves into the development of control, computing system, communication, HD map, testing, and human behaviors in IVs. This part, the third part, reviews perception and planning in the context of IVs. Aiming to provide a comprehensive overview of the latest advancements in AD and IVs, this work caters to both newcomers and seasoned researchers. By integrating the SoS and Part I, we offer unique insights and strive to serve as a bridge between past achievements and future possibilities in this dynamic field.
CVApr 15
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial ScenariosZebei Tong, Hongchang Chen, Yujie Lei et al.
Image generation technology can synthesize condition-specific images to supplement real-world industrial anomaly data and enhance anomaly detection model performance. Existing generation techniques rarely account for the pose and orientation of industrial components in assembly, making the generated images difficult to utilize for downstream application. To solve this, we propose a novel image synthesis approach, called PostureObjectStitch, that achieves accurate generation to meet the requirement of industrial assembly. A condition decoupling approach is introduced to separate input multi-view images into high-frequency, texture, and RGB features. The feature temporal modulation mechanism adapts these features across diffusion model time-steps, enabling progressive generation from coarse to fine details while maintaining consistency. To ensure semantic accuracy, we introduce a conditional loss that enhances critical industrial elements and a geometric prior that guides component positioning for correct assembly relationships. Comprehensive experimental results on the MureCom dataset, our newly contributed DreamAssembly dataset, and the downstream application validate the outstanding performance of our method.
CVJul 13, 2020Code
CenterNet3D: An Anchor Free Object Detector for Point CloudGuojun Wang, Jian Wu, Bin Tian et al.
Accurate and fast 3D object detection from point clouds is a key task in autonomous driving. Existing one-stage 3D object detection methods can achieve real-time performance, however, they are dominated by anchor-based detectors which are inefficient and require additional post-processing. In this paper, we eliminate anchors and model an object as a single point--the center point of its bounding box. Based on the center point, we propose an anchor-free CenterNet3D network that performs 3D object detection without anchors. Our CenterNet3D uses keypoint estimation to find center points and directly regresses 3D bounding boxes. However, because inherent sparsity of point clouds, 3D object center points are likely to be in empty space which makes it difficult to estimate accurate boundaries. To solve this issue, we propose an extra corner attention module to enforce the CNN backbone to pay more attention to object boundaries. Besides, considering that one-stage detectors suffer from the discordance between the predicted bounding boxes and corresponding classification confidences, we develop an efficient keypoint-sensitive warping operation to align the confidences to the predicted bounding boxes. Our proposed CenterNet3D is non-maximum suppression free which makes it more efficient and simpler. We evaluate CenterNet3D on the widely used KITTI dataset and more challenging nuScenes dataset. Our method outperforms all state-of-the-art anchor-based one-stage methods and has comparable performance to two-stage methods as well. It has an inference speed of 20 FPS and achieves the best speed and accuracy trade-off. Our source code will be released at https://github.com/wangguojun2018/CenterNet3d.
CVApr 10
CAD 100K: A Comprehensive Multi-Task Dataset for Car Related Visual Anomaly DetectionJiahua Pang, Ying Li, Dongpu Cao et al.
Multi-task visual anomaly detection is critical for car-related manufacturing quality assessment. However, existing methods remain task-specific, hindered by the absence of a unified benchmark for multi-task evaluation. To fill in this gap, We present the CAD Dataset, a large-scale and comprehensive benchmark designed for car-related multi-task visual anomaly detection. The dataset contains over 100 images crossing 7 vehicle domains and 3 tasks, providing models a comprehensive view for car-related anomaly detection. It is the first car-related anomaly dataset specialized for multi-task learning(MTL), while combining synthesis data augmentation for few-shot anomaly images. We implement a multi-task baseline and conduct extensive empirical studies. Results show MTL promotes task interaction and knowledge transfer, while also exposing challenging conflicts between tasks. The CAD dataset serves as a standardized platform to drive future advances in car-related multi-task visual anomaly detection.
CVApr 9
GroundingAnomaly: Spatially-Grounded Diffusion for Few-Shot Anomaly SynthesisYishen Liu, Hongcang Chen, Pengcheng Zhao et al.
The performance of visual anomaly inspection in industrial quality control is often constrained by the scarcity of real anomalous samples. Consequently, anomaly synthesis techniques have been developed to enlarge training sets and enhance downstream inspection. However, existing methods either suffer from poor integration caused by inpainting or fail to provide accurate masks. To address these limitations, we propose GroundingAnomaly, a novel few-shot anomaly image generation framework. Our framework introduces a Spatial Conditioning Module that leverages per-pixel semantic maps to enable precise spatial control over the synthesized anomalies. Furthermore, a Gated Self-Attention Module is designed to inject conditioning tokens into a frozen U-Net via gated attention layers. This carefully preserves pretrained priors while ensuring stable few-shot adaptation. Extensive evaluations on the MVTec AD and VisA datasets demonstrate that GroundingAnomaly generates high-quality anomalies and achieves state-of-the-art performance across multiple downstream tasks, including anomaly detection, segmentation, and instance-level detection.
AIMay 12, 2023
Milestones in Autonomous Driving and Intelligent Vehicles Part I: Control, Computing System Design, Communication, HD Map, Testing, and Human BehaviorsLong Chen, Yuchen Li, Chao Huang et al.
Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks and lack systematic summaries and research directions in the future. Our work is divided into 3 independent articles and the first part is a Survey of Surveys (SoS) for total technologies of AD and IVs that involves the history, summarizes the milestones, and provides the perspectives, ethics, and future research directions. This is the second part (Part I for this technical survey) to review the development of control, computing system design, communication, High Definition map (HD map), testing, and human behaviors in IVs. In addition, the third part (Part II for this technical survey) is to review the perception and planning sections. The objective of this paper is to involve all the sections of AD, summarize the latest technical milestones, and guide abecedarians to quickly understand the development of AD and IVs. Combining the SoS and Part II, we anticipate that this work will bring novel and diverse insights to researchers and abecedarians, and serve as a bridge between past and future.
CVDec 13, 2020
MSAF: Multimodal Split Attention FusionLang Su, Chuqing Hu, Guofa Li et al.
Multimodal learning mimics the reasoning process of the human multi-sensory system, which is used to perceive the surrounding world. While making a prediction, the human brain tends to relate crucial cues from multiple sources of information. In this work, we propose a novel multimodal fusion module that learns to emphasize more contributive features across all modalities. Specifically, the proposed Multimodal Split Attention Fusion (MSAF) module splits each modality into channel-wise equal feature blocks and creates a joint representation that is used to generate soft attention for each channel across the feature blocks. Further, the MSAF module is designed to be compatible with features of various spatial dimensions and sequence lengths, suitable for both CNNs and RNNs. Thus, MSAF can be easily added to fuse features of any unimodal networks and utilize existing pretrained unimodal model weights. To demonstrate the effectiveness of our fusion module, we design three multimodal networks with MSAF for emotion recognition, sentiment analysis, and action recognition tasks. Our approach achieves competitive results in each task and outperforms other application-specific networks and multimodal fusion benchmarks.
CVNov 2, 2020
Multi-View Adaptive Fusion Network for 3D Object DetectionGuojun Wang, Bin Tian, Yachen Zhang et al.
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving. However, it has been surprisingly difficult to effectively fuse both modalities without information loss and interference. To solve this issue, we propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection. To effectively fuse multi-view features, we propose an attentive pointwise fusion (APF) module to estimate the importance of the three sources with attention mechanisms that can achieve adaptive fusion of multi-view features in a pointwise manner. Furthermore, an attentive pointwise weighting (APW) module is designed to help the network learn structure information and point feature importance with two extra tasks, namely, foreground classification and center regression, and the predicted foreground probability is used to reweight the point features. We design an end-to-end learnable network named MVAF-Net to integrate these two components. Our evaluations conducted on the KITTI 3D object detection datasets demonstrate that the proposed APF and APW modules offer significant performance gains. Moreover, the proposed MVAF-Net achieves the best performance among all single-stage fusion methods and outperforms most two-stage fusion methods, achieving the best trade-off between speed and accuracy on the KITTI benchmark.
AISep 7, 2020
Driving Tasks Transfer in Deep Reinforcement Learning for Decision-making of Autonomous VehiclesHong Shu, Teng Liu, Xingyu Mu et al.
Knowledge transfer is a promising concept to achieve real-time decision-making for autonomous vehicles. This paper constructs a transfer deep reinforcement learning framework to transform the driving tasks in inter-section environments. The driving missions at the un-signalized intersection are cast into a left turn, right turn, and running straight for automated vehicles. The goal of the autonomous ego vehicle (AEV) is to drive through the intersection situation efficiently and safely. This objective promotes the studied vehicle to increase its speed and avoid crashing other vehicles. The decision-making pol-icy learned from one driving task is transferred and evaluated in another driving mission. Simulation results reveal that the decision-making strategies related to similar tasks are transferable. It indicates that the presented control framework could reduce the time consumption and realize online implementation.
AIJul 26, 2020
Defining Digital Quadruplets in the Cyber-Physical-Social Space for Parallel DrivingTeng Liu, Yang Xing, Long Chen et al.
Parallel driving is a novel framework to synthesize vehicle intelligence and transport automation. This article aims to define digital quadruplets in parallel driving. In the cyber-physical-social systems (CPSS), based on the ACP method, the names of the digital quadruplets are first given, which are descriptive, predictive, prescriptive and real vehicles. The objectives of the three virtual digital vehicles are interacting, guiding, simulating and improving with the real vehicles. Then, the three virtual components of the digital quadruplets are introduced in detail and their applications are also illustrated. Finally, the real vehicles in the parallel driving system and the research process of the digital quadruplets are depicted. The presented digital quadruplets in parallel driving are expected to make the future connected automated driving safety, efficiently and synergistically.
SYJul 24, 2020
Adaptive Energy Management for Real Driving Conditions via Transfer Reinforcement LearningTeng Liu, Wenhao Tan, Xiaolin Tang et al.
This article proposes a transfer reinforcement learning (RL) based adaptive energy managing approach for a hybrid electric vehicle (HEV) with parallel topology. This approach is bi-level. The up-level characterizes how to transform the Q-value tables in the RL framework via driving cycle transformation (DCT). Especially, transition probability matrices (TPMs) of power request are computed for different cycles, and induced matrix norm (IMN) is employed as a critical criterion to identify the transformation differences and to determine the alteration of the control strategy. The lower-level determines how to set the corresponding control strategies with the transformed Q-value tables and TPMs by using model-free reinforcement learning (RL) algorithm. Numerical tests illustrate that the transferred performance can be tuned by IMN value and the transfer RL controller could receive a higher fuel economy. The comparison demonstrates that the proposed strategy exceeds the conventional RL approach in both calculation speed and control performance.
SPJul 16, 2020
Decision-making Strategy on Highway for Autonomous Vehicles using Deep Reinforcement LearningJiangdong Liao, Teng Liu, Xiaolin Tang et al.
Autonomous driving is a promising technology to reduce traffic accidents and improve driving efficiency. In this work, a deep reinforcement learning (DRL)-enabled decision-making policy is constructed for autonomous vehicles to address the overtaking behaviors on the highway. First, a highway driving environment is founded, wherein the ego vehicle aims to pass through the surrounding vehicles with an efficient and safe maneuver. A hierarchical control framework is presented to control these vehicles, which indicates the upper-level manages the driving decisions, and the lower-level cares about the supervision of vehicle speed and acceleration. Then, the particular DRL method named dueling deep Q-network (DDQN) algorithm is applied to derive the highway decision-making strategy. The exhaustive calculative procedures of deep Q-network and DDQN algorithms are discussed and compared. Finally, a series of estimation simulation experiments are conducted to evaluate the effectiveness of the proposed highway decision-making policy. The advantages of the proposed framework in convergence rate and control performance are illuminated. Simulation results reveal that the DDQN-based overtaking policy could accomplish highway driving tasks efficiently and safely.
LGJul 16, 2020
Comparison of Different Methods for Time Sequence Prediction in Autonomous VehiclesTeng Liu, Bin Tian, Yunfeng Ai et al.
As a combination of various kinds of technologies, autonomous vehicles could complete a series of driving tasks by itself, such as perception, decision-making, planning, and control. Since there is no human driver to handle the emergency situation, future transportation information is significant for automated vehicles. This paper proposes different methods to forecast the time series for autonomous vehicles, which are the nearest neighborhood (NN), fuzzy coding (FC), and long short term memory (LSTM). First, the formulation and operational process for these three approaches are introduced. Then, the vehicle velocity is regarded as a case study and the real-world dataset is utilized to predict future information via these techniques. Finally, the performance, merits, and drawbacks of the presented methods are analyzed and discussed.
AIJul 16, 2020
Dueling Deep Q Network for Highway Decision Making in Autonomous Vehicles: A Case StudyTeng Liu, Xingyu Mu, Xiaolin Tang et al.
This work optimizes the highway decision making strategy of autonomous vehicles by using deep reinforcement learning (DRL). First, the highway driving environment is built, wherein the ego vehicle, surrounding vehicles, and road lanes are included. Then, the overtaking decision-making problem of the automated vehicle is formulated as an optimal control problem. Then relevant control actions, state variables, and optimization objectives are elaborated. Finally, the deep Q-network is applied to derive the intelligent driving policies for the ego vehicle. Simulation results reveal that the ego vehicle could safely and efficiently accomplish the driving task after learning and training.
CVMay 20, 2020
Deep Learning for LiDAR Point Clouds in Autonomous Driving: A ReviewYing Li, Lingfei Ma, Zilong Zhong et al.
Recently, the advancement of deep learning in discriminative feature learning from 3D LiDAR data has led to rapid development in the field of autonomous driving. However, automated processing uneven, unstructured, noisy, and massive 3D point clouds is a challenging and tedious task. In this paper, we provide a systematic review of existing compelling deep learning architectures applied in LiDAR point clouds, detailing for specific tasks in autonomous driving such as segmentation, detection, and classification. Although several published research papers focus on specific topics in computer vision for autonomous vehicles, to date, no general survey on deep learning applied in LiDAR point clouds for autonomous vehicles exists. Thus, the goal of this paper is to narrow the gap in this topic. More than 140 key contributions in the recent five years are summarized in this survey, including the milestone 3D deep architectures, the remarkable deep learning applications in 3D semantic segmentation, object detection, and classification; specific datasets, evaluation metrics, and the state of the art performance. Finally, we conclude the remaining challenges and future researches.
CVApr 26, 2020
A Spontaneous Driver Emotion Facial Expression (DEFE) Dataset for Intelligent VehiclesWenbo Li, Yaodong Cui, Yintao Ma et al.
In this paper, we introduce a new dataset, the driver emotion facial expression (DEFE) dataset, for driver spontaneous emotions analysis. The dataset includes facial expression recordings from 60 participants during driving. After watching a selected video-audio clip to elicit a specific emotion, each participant completed the driving tasks in the same driving scenario and rated their emotional responses during the driving processes from the aspects of dimensional emotion and discrete emotion. We also conducted classification experiments to recognize the scales of arousal, valence, dominance, as well as the emotion category and intensity to establish baseline results for the proposed dataset. Besides, this paper compared and discussed the differences in facial expressions between driving and non-driving scenarios. The results show that there were significant differences in AUs (Action Units) presence of facial expressions between driving and non-driving scenarios, indicating that human emotional expressions in driving scenarios were different from other life scenarios. Therefore, publishing a human emotion dataset specifically for the driver is necessary for traffic safety improvement. The proposed dataset will be publicly available so that researchers worldwide can use it to develop and examine their driver emotion analysis methods. To the best of our knowledge, this is currently the only public driver facial expression dataset.
CVApr 10, 2020
Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A ReviewYaodong Cui, Ren Chen, Wenbo Chu et al.
Autonomous vehicles were experiencing rapid development in the past few years. However, achieving full autonomy is not a trivial task, due to the nature of the complex and dynamic driving environment. Therefore, autonomous vehicles are equipped with a suite of different sensors to ensure robust, accurate environmental perception. In particular, the camera-LiDAR fusion is becoming an emerging research theme. However, so far there has been no critical review that focuses on deep-learning-based camera-LiDAR fusion methods. To bridge this gap and motivate future research, this paper devotes to review recent deep-learning-based data fusion approaches that leverage both image and point cloud. This review gives a brief overview of deep learning on image and point cloud data processing. Followed by in-depth reviews of camera-LiDAR fusion methods in depth completion, object detection, semantic segmentation, tracking and online cross-sensor calibration, which are organized based on their respective fusion levels. Furthermore, we compare these methods on publicly available datasets. Finally, we identified gaps and over-looked challenges between current academic researches and real-world applications. Based on these observations, we provide our insights and point out promising research directions.
ROApr 22, 2019
A Right-of-Way Based Strategy to Implement Safe and Efficient Driving at Non-Signalized Intersections for Automated VehiclesYadong Xing, Can Zhao, ZhiHeng Li et al.
Non-signalized intersection is a typical and common scenario for connected and automated vehicles (CAVs). How to balance safety and efficiency remains difficult for researchers. To improve the original Responsibility Sensitive Safety (RSS) driving strategy on the non-signalized intersection, we propose a new strategy in this paper, based on right-of-way assignment (RWA). The performances of RSS strategy, cooperative driving strategy, and RWA based strategy are tested and compared. Testing results indicate that our strategy yields better traffic efficiency than RSS strategy, but not satisfying as the cooperative driving strategy due to the limited range of communication and the lack of long-term planning. However, our new strategy requires much fewer communication costs among vehicles.
CVMar 20, 2019
Affordance Learning In Direct Perception for Autonomous DrivingChen Sun, Jean M. Uwabeza Vianney, Dongpu Cao
Recent development in autonomous driving involves high-level computer vision and detailed road scene understanding. Today, most autonomous vehicles are using mediated perception approach for path planning and control, which highly rely on high-definition 3D maps and real time sensors. Recent research efforts aim to substitute the massive HD maps with coarse road attributes. In this paper, we follow the direct perception based method to train a deep neural network for affordance learning in autonomous driving. Our goal in this work is to develop the affordance learning model based on freely available Google Street View panoramas and Open Street Map road vector attributes. Driving scene understanding can be achieved by learning affordances from the images captured by car-mounted cameras. Such scene understanding by learning affordances may be useful for corroborating base maps such as HD maps so that the required data storage space is minimized and available for processing in real time. We compare capability in road attribute identification between human volunteers and our model by experimental evaluation. Our results indicate that this method could act as a cheaper way for training data collection in autonomous driving. The cross validation results also indicate the effectiveness of our model.