Qi Hao

CV
h-index38
31papers
634citations
Novelty47%
AI Score34

31 Papers

ROJun 3, 2022
Federated Deep Learning Meets Autonomous Vehicle Perception: Design and Verification

Shuai Wang, Chengyang Li, Derrick Wing Kwan Ng et al.

Realizing human-like perception is a challenge in open driving scenarios due to corner cases and visual occlusions. To gather knowledge of rare and occluded instances, federated learning assisted connected autonomous vehicle (FLCAV) has been proposed, which leverages vehicular networks to establish federated deep neural networks (DNNs) from distributed data captured by vehicles and road sensors. Without the need of data aggregation, FLCAV preserves privacy while reducing communication costs compared with conventional centralized learning. However, it is challenging to determine the network resources and road sensor placements for multi-stage training with multi-modal datasets in multi-variant scenarios. This article presents networking and training frameworks for FLCAV perception. Multi-layer graph resource allocation and vehicle-road contrastive sensor placement are proposed to address the network management and sensor deployment problems, respectively. We also develop CarlaFLCAV, a software platform that implements the above system and methods. Experimental results confirm the superiority of the proposed techniques compared with various benchmarks.

CVAug 26, 2023
Vision-Based Human Pose Estimation via Deep Learning: A Survey

Gongjin Lan, Yu Wu, Fei Hu et al.

Human pose estimation (HPE) has attracted a significant amount of attention from the computer vision community in the past decades. Moreover, HPE has been applied to various domains, such as human-computer interaction, sports analysis, and human tracking via images and videos. Recently, deep learning-based approaches have shown state-of-the-art performance in HPE-based applications. Although deep learning-based approaches have achieved remarkable performance in HPE, a comprehensive review of deep learning-based HPE methods remains lacking in the literature. In this article, we provide an up-to-date and in-depth overview of the deep learning approaches in vision-based HPE. We summarize these methods of 2-D and 3-D HPE, and their applications, discuss the challenges and the research trends through bibliometrics, and provide insightful recommendations for future research. This article provides a meaningful overview as introductory material for beginners to deep learning-based HPE, as well as supplementary material for advanced researchers.

HCNov 2, 2023Code
A Virtual Reality Training System for Automotive Engines Assembly and Disassembly

Gongjin Lan, Qiangqiang Lai, Bing Bai et al.

Automotive engine assembly and disassembly are common and crucial programs in the automotive industry. Traditional education trains students to learn automotive engine assembly and disassembly in lecture courses and then to operate with physical engines, which are generally low effectiveness and high cost. In this work, we developed a multi-layer structured Virtual Reality (VR) system to provide students with training in automotive engine (Buick Verano) assembly and disassembly. We designed the VR training system with The VR training system is designed to have several major features, including replaceable engine parts and reusable tools, friendly user interfaces and guidance, and bottom-up designed multi-layer architecture, which can be extended to various engine models. The VR system is evaluated with controlled experiments of two groups of students. The results demonstrate that our VR training system provides remarkable usability in terms of effectiveness and efficiency. Currently, our VR system has been demonstrated and employed in the courses of Chinese colleges to train students in automotive engine assembly and disassembly. A free-to-use executable file (Microsoft Windows) and open-source code are available at https://github.com/LadissonLai/SUSTech_VREngine for facilitating the development of VR systems in the automotive industry. Finally, a video describing the operations in our VR training system is available at https://www.youtube.com/watch?v=yZe4YTwwAC4

CVJul 18, 2024Code
SUSTechGAN: Image Generation for Object Detection in Adverse Conditions of Autonomous Driving

Gongjin Lan, Yang Peng, Qi Hao et al.

Autonomous driving significantly benefits from data-driven deep neural networks. However, the data in autonomous driving typically fits the long-tailed distribution, in which the critical driving data in adverse conditions is hard to collect. Although generative adversarial networks (GANs) have been applied to augment data for autonomous driving, generating driving images in adverse conditions is still challenging. In this work, we propose a novel framework, SUSTechGAN, with customized dual attention modules, multi-scale generators, and a novel loss function to generate driving images for improving object detection of autonomous driving in adverse conditions. We test the SUSTechGAN and the well-known GANs to generate driving images in adverse conditions of rain and night and apply the generated images to retrain object detection networks. Specifically, we add generated images into the training datasets to retrain the well-known YOLOv5 and evaluate the improvement of the retrained YOLOv5 for object detection in adverse conditions. The experimental results show that the generated driving images by our SUSTechGAN significantly improved the performance of retrained YOLOv5 in rain and night conditions, which outperforms the well-known GANs. The open-source code, video description and datasets are available on the page 1 to facilitate image generation development in autonomous driving under adverse conditions.

CVJul 1, 2024
DIR-BHRNet: A Lightweight Network for Real-time Vision-based Multi-person Pose Estimation on Smartphones

Gongjin Lan, Yu Wu, Qi Hao

Human pose estimation (HPE), particularly multi-person pose estimation (MPPE), has been applied in many domains such as human-machine systems. However, the current MPPE methods generally run on powerful GPU systems and take a lot of computational costs. Real-time MPPE on mobile devices with low-performance computing is a challenging task. In this paper, we propose a lightweight neural network, DIR-BHRNet, for real-time MPPE on smartphones. In DIR-BHRNet, we design a novel lightweight convolutional module, Dense Inverted Residual (DIR), to improve accuracy by adding a depthwise convolution and a shortcut connection into the well-known Inverted Residual, and a novel efficient neural network structure, Balanced HRNet (BHRNet), to reduce computational costs by reconfiguring the proper number of convolutional blocks on each branch. We evaluate DIR-BHRNet on the well-known COCO and CrowdPose datasets. The results show that DIR-BHRNet outperforms the state-of-the-art methods in terms of accuracy with a real-time computational cost. Finally, we implement the DIR-BHRNet on the current mainstream Android smartphones, which perform more than 10 FPS. The free-used executable file (Android 10), source code, and a video description of this work are publicly available on the page 1 to facilitate the development of real-time MPPE on smartphones.

CLJul 8, 2024
ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data

Zixin Shu, Rui Hua, Dengying Yan et al.

Symptom phenotypes are one of the key types of manifestations for diagnosis and treatment of various disease conditions. However, the diversity of symptom terminologies is one of the major obstacles hindering the analysis and knowledge sharing of various types of symptom-related medical data particularly in the fields of Traditional Chinese Medicine (TCM). Objective: This study aimed to construct an Integrated Ontology of symptom phenotypes (ISPO) to support the data mining of Chinese EMRs and real-world study in TCM field. Methods: To construct an integrated ontology of symptom phenotypes (ISPO), we manually annotated classical TCM textbooks and large-scale Chinese electronic medical records (EMRs) to collect symptom terms with support from a medical text annotation system. Furthermore, to facilitate the semantic interoperability between different terminologies, we incorporated public available biomedical vocabularies by manual mapping between Chinese terms and English terms with cross-references to source vocabularies. In addition, we evaluated the ISPO using independent clinical EMRs to provide a high-usable medical ontology for clinical data analysis. Results: By integrating 78,696 inpatient cases of EMRs, 5 biomedical vocabularies, 21 TCM books and dictionaries, ISPO provides 3,147 concepts, 23,475 terms, and 55,552 definition or contextual texts. Adhering to the taxonomical structure of the related anatomical systems of symptom phenotypes, ISPO provides 12 top-level categories and 79 middle-level sub-categories. The validation of data analysis showed the ISPO has a coverage rate of 95.35%, 98.53% and 92.66% for symptom terms with occurrence rates of 0.5% in additional three independent curated clinical datasets, which can demonstrate the significant value of ISPO in mapping clinical terms to ontologies.

CVJan 22, 2022Code
Phase-SLAM: Phase Based Simultaneous Localization and Mapping for Mobile Structured Light Illumination Systems

Xi Zheng, Rui Ma, Rui Gao et al.

Structured Light Illumination (SLI) systems have been used for reliable indoor dense 3D scanning via phase triangulation. However, mobile SLI systems for 360 degree 3D reconstruction demand 3D point cloud registration, involving high computational complexity. In this paper, we propose a phase based Simultaneous Localization and Mapping (Phase-SLAM) framework for fast and accurate SLI sensor pose estimation and 3D object reconstruction. The novelty of this work is threefold: (1) developing a reprojection model from 3D points to 2D phase data towards phase registration with low computational complexity; (2) developing a local optimizer to achieve SLI sensor pose estimation (odometry) using the derived Jacobian matrix for the 6 DoF variables; (3) developing a compressive phase comparison method to achieve high-efficiency loop closure detection. The whole Phase-SLAM pipeline is then exploited using existing global pose graph optimization techniques. We build datasets from both the unreal simulation platform and a robotic arm based SLI system in real-world to verify the proposed approach. The experiment results demonstrate that the proposed Phase-SLAM outperforms other state-of-the-art methods in terms of the efficiency and accuracy of pose estimation and 3D reconstruction. The open-source code is available at https://github.com/ZHENGXi-git/Phase-SLAM.

ROMar 11, 2024
NeuPAN: Direct Point Robot Navigation with End-to-End Model-based Learning

Ruihua Han, Shuai Wang, Shuaijun Wang et al.

Navigating a nonholonomic robot in a cluttered, unknown environment requires accurate perception and precise motion control for real-time collision avoidance. This paper presents NeuPAN: a real-time, highly accurate, map-free, easy-to-deploy, and environment-invariant robot motion planner. Leveraging a tightly coupled perception-to-control framework, NeuPAN has two key innovations compared to existing approaches: 1) it directly maps raw point cloud data to a latent distance feature space for collision-free motion generation, avoiding error propagation from the perception to control pipeline; 2) it is interpretable from an end-to-end model-based learning perspective. The crux of NeuPAN is solving an end-to-end mathematical model with numerous point-level constraints using a plug-and-play (PnP) proximal alternating-minimization network (PAN), incorporating neurons in the loop. This allows NeuPAN to generate real-time, physically interpretable motions. It seamlessly integrates data and knowledge engines, and its network parameters can be fine-tuned via backpropagation. We evaluate NeuPAN on a ground mobile robot, a wheel-legged robot, and an autonomous vehicle, in extensive simulated and real-world environments. Results demonstrate that NeuPAN outperforms existing baselines in terms of accuracy, efficiency, robustness, and generalization capabilities across various environments, including the cluttered sandbox, office, corridor, and parking lot. We show that NeuPAN works well in unknown and unstructured environments with arbitrarily shaped objects, transforming impassable paths into passable ones.

RODec 26, 2023
End-To-End Planning of Autonomous Driving in Industry and Academia: 2022-2023

Gongjin Lan, Qi Hao

This paper aims to provide a quick review of the methods including the technologies in detail that are currently reported in industry and academia. Specifically, this paper reviews the end-to-end planning, including Tesla FSD V12, Momenta 2023, Horizon Robotics 2023, Motional RoboTaxi 2022, Woven Planet (Toyota): Urban Driver, and Nvidia. In addition, we review the state-of-the-art academic studies that investigate end-to-end planning of autonomous driving. This paper provides readers with a concise structure and fast learning of state-of-the-art end-to-end planning for 2022-2023. This article provides a meaningful overview as introductory material for beginners to follow the state-of-the-art end-to-end planning of autonomous driving in industry and academia, as well as supplementary material for advanced researchers.

CVMar 19, 2025
SemanticFlow: A Self-Supervised Framework for Joint Scene Flow Prediction and Instance Segmentation in Dynamic Environments

Yinqi Chen, Meiying Zhang, Qi Hao et al.

Accurate perception of dynamic traffic scenes is crucial for high-level autonomous driving systems, requiring robust object motion estimation and instance segmentation. However, traditional methods often treat them as separate tasks, leading to suboptimal performance, spatio-temporal inconsistencies, and inefficiency in complex scenarios due to the absence of information sharing. This paper proposes a multi-task SemanticFlow framework to simultaneously predict scene flow and instance segmentation of full-resolution point clouds. The novelty of this work is threefold: 1) developing a coarse-to-fine prediction based multi-task scheme, where an initial coarse segmentation of static backgrounds and dynamic objects is used to provide contextual information for refining motion and semantic information through a shared feature processing module; 2) developing a set of loss functions to enhance the performance of scene flow estimation and instance segmentation, while can help ensure spatial and temporal consistency of both static and dynamic objects within traffic scenes; 3) developing a self-supervised learning scheme, which utilizes coarse segmentation to detect rigid objects and compute their transformation matrices between sequential frames, enabling the generation of self-supervised labels. The proposed framework is validated on the Argoverse and Waymo datasets, demonstrating superior performance in instance segmentation accuracy, scene flow estimation, and computational efficiency, establishing a new benchmark for self-supervised methods in dynamic scene understanding.

ROJan 28, 2025
SSF-PAN: Semantic Scene Flow-Based Perception for Autonomous Navigation in Traffic Scenarios

Yinqi Chen, Meiying Zhang, Qi Hao et al.

Vehicle detection and localization in complex traffic scenarios pose significant challenges due to the interference of moving objects. Traditional methods often rely on outlier exclusions or semantic segmentations, which suffer from low computational efficiency and accuracy. The proposed SSF-PAN can achieve the functionalities of LiDAR point cloud based object detection/localization and SLAM (Simultaneous Localization and Mapping) with high computational efficiency and accuracy, enabling map-free navigation frameworks. The novelty of this work is threefold: 1) developing a neural network which can achieve segmentation among static and dynamic objects within the scene flows with different motion features, that is, semantic scene flow (SSF); 2) developing an iterative framework which can further optimize the quality of input scene flows and output segmentation results; 3) developing a scene flow-based navigation platform which can test the performance of the SSF perception system in the simulation environment. The proposed SSF-PAN method is validated using the SUScape-CARLA and the KITTI datasets, as well as on the CARLA simulator. Experimental results demonstrate that the proposed approach outperforms traditional methods in terms of scene flow computation accuracy, moving object detection accuracy, computational efficiency, and autonomous navigation effectiveness.

CVDec 25, 2024
TSceneJAL: Joint Active Learning of Traffic Scenes for 3D Object Detection

Chenyang Lei, Weiyuan Peng, Guang Zhou et al.

Most autonomous driving (AD) datasets incur substantial costs for collection and labeling, inevitably yielding a plethora of low-quality and redundant data instances, thereby compromising performance and efficiency. Many applications in AD systems necessitate high-quality training datasets using both existing datasets and newly collected data. In this paper, we propose a traffic scene joint active learning (TSceneJAL) framework that can efficiently sample the balanced, diverse, and complex traffic scenes from both labeled and unlabeled data. The novelty of this framework is threefold: 1) a scene sampling scheme based on a category entropy, to identify scenes containing multiple object classes, thus mitigating class imbalance for the active learner; 2) a similarity sampling scheme, estimated through the directed graph representation and a marginalize kernel algorithm, to pick sparse and diverse scenes; 3) an uncertainty sampling scheme, predicted by a mixture density network, to select instances with the most unclear or complex regression outcomes for the learner. Finally, the integration of these three schemes in a joint selection strategy yields an optimal and valuable subdataset. Experiments on the KITTI, Lyft, nuScenes and SUScape datasets demonstrate that our approach outperforms existing state-of-the-art methods on 3D object detection tasks with up to 12% improvements.

AIMay 20, 2025
Disentangled Multi-span Evolutionary Network against Temporal Knowledge Graph Reasoning

Hao Dong, Ziyue Qiao, Zhiyuan Ning et al.

Temporal Knowledge Graphs (TKGs), as an extension of static Knowledge Graphs (KGs), incorporate the temporal feature to express the transience of knowledge by describing when facts occur. TKG extrapolation aims to infer possible future facts based on known history, which has garnered significant attention in recent years. Some existing methods treat TKG as a sequence of independent subgraphs to model temporal evolution patterns, demonstrating impressive reasoning performance. However, they still have limitations: 1) In modeling subgraph semantic evolution, they usually neglect the internal structural interactions between subgraphs, which are actually crucial for encoding TKGs. 2) They overlook the potential smooth features that do not lead to semantic changes, which should be distinguished from the semantic evolution process. Therefore, we propose a novel Disentangled Multi-span Evolutionary Network (DiMNet) for TKG reasoning. Specifically, we design a multi-span evolution strategy that captures local neighbor features while perceiving historical neighbor semantic information, thus enabling internal interactions between subgraphs during the evolution process. To maximize the capture of semantic change patterns, we design a disentangle component that adaptively separates nodes' active and stable features, used to dynamically control the influence of historical semantics on future evolution. Extensive experiments conducted on four real-world TKG datasets show that DiMNet demonstrates substantial performance in TKG reasoning, and outperforms the state-of-the-art up to 22.7% in MRR.

CVMay 6, 2025
Coop-WD: Cooperative Perception with Weighting and Denoising for Robust V2V Communication

Chenguang Liu, Jianjun Chen, Yunfei Chen et al.

Cooperative perception, leveraging shared information from multiple vehicles via vehicle-to-vehicle (V2V) communication, plays a vital role in autonomous driving to alleviate the limitation of single-vehicle perception. Existing works have explored the effects of V2V communication impairments on perception precision, but they lack generalization to different levels of impairments. In this work, we propose a joint weighting and denoising framework, Coop-WD, to enhance cooperative perception subject to V2V channel impairments. In this framework, the self-supervised contrastive model and the conditional diffusion probabilistic model are adopted hierarchically for vehicle-level and pixel-level feature enhancement. An efficient variant model, Coop-WD-eco, is proposed to selectively deactivate denoising to reduce processing overhead. Rician fading, non-stationarity, and time-varying distortion are considered. Simulation results demonstrate that the proposed Coop-WD outperforms conventional benchmarks in all types of channels. Qualitative analysis with visual examples further proves the superiority of our proposed method. The proposed Coop-WD-eco achieves up to 50% reduction in computational cost under severe distortion while maintaining comparable accuracy as channel conditions improve.

GRMar 19, 2025
ClimateGS: Real-Time Climate Simulation with 3D Gaussian Style Transfer

Yuezhen Xie, Meiying Zhang, Qi Hao

Adverse climate conditions pose significant challenges for autonomous systems, demanding reliable perception and decision-making across diverse environments. To better simulate these conditions, physically-based NeRF rendering methods have been explored for their ability to generate realistic scene representations. However, these methods suffer from slow rendering speeds and long preprocessing times, making them impractical for real-time testing and user interaction. This paper presents ClimateGS, a novel framework integrating 3D Gaussian representations with physical simulation to enable real-time climate effects rendering. The novelty of this work is threefold: 1) developing a linear transformation for 3D Gaussian photorealistic style transfer, enabling direct modification of spherical harmonics across bands for efficient and consistent style adaptation; 2) developing a joint training strategy for 3D style transfer, combining supervised and self-supervised learning to accelerate convergence while preserving original scene details; 3) developing a real-time rendering method for climate simulation, integrating physics-based effects with 3D Gaussian to achieve efficient and realistic rendering. We evaluate ClimateGS on MipNeRF360 and Tanks and Temples, demonstrating real-time rendering with comparable or superior visual quality to SOTA 2D/3D methods, making it suitable for interactive applications.

LGNov 15, 2024
Is Precise Recovery Necessary? A Task-Oriented Imputation Approach for Time Series Forecasting on Variable Subset

Qi Hao, Runchang Liang, Yue Gao et al.

Variable Subset Forecasting (VSF) refers to a unique scenario in multivariate time series forecasting, where available variables in the inference phase are only a subset of the variables in the training phase. VSF presents significant challenges as the entire time series may be missing, and neither inter- nor intra-variable correlations persist. Such conditions impede the effectiveness of traditional imputation methods, primarily focusing on filling in individual missing data points. Inspired by the principle of feature engineering that not all variables contribute positively to forecasting, we propose Task-Oriented Imputation for VSF (TOI-VSF), a novel framework shifts the focus from accurate data recovery to directly support the downstream forecasting task. TOI-VSF incorporates a self-supervised imputation module, agnostic to the forecasting model, designed to fill in missing variables while preserving the vital characteristics and temporal patterns of time series data. Additionally, we implement a joint learning strategy for imputation and forecasting, ensuring that the imputation process is directly aligned with and beneficial to the forecasting objective. Extensive experiments across four datasets demonstrate the superiority of TOI-VSF, outperforming baseline methods by $15\%$ on average.

CVJun 26, 2024
BiTrack: Bidirectional Offline 3D Multi-Object Tracking Using Camera-LiDAR Data

Kemiao Huang, Yinqi Chen, Meiying Zhang et al.

Compared with real-time multi-object tracking (MOT), offline multi-object tracking (OMOT) has the advantages to perform 2D-3D detection fusion, erroneous link correction, and full track optimization but has to deal with the challenges from bounding box misalignment and track evaluation, editing, and refinement. This paper proposes "BiTrack", a 3D OMOT framework that includes modules of 2D-3D detection fusion, initial trajectory generation, and bidirectional trajectory re-optimization to achieve optimal tracking results from camera-LiDAR data. The novelty of this paper includes threefold: (1) development of a point-level object registration technique that employs a density-based similarity metric to achieve accurate fusion of 2D-3D detection results; (2) development of a set of data association and track management skills that utilizes a vertex-based similarity metric as well as false alarm rejection and track recovery mechanisms to generate reliable bidirectional object trajectories; (3) development of a trajectory re-optimization scheme that re-organizes track fragments of different fidelities in a greedy fashion, as well as refines each trajectory with completion and smoothing techniques. The experiment results on the KITTI dataset demonstrate that BiTrack achieves the state-of-the-art performance for 3D OMOT tasks in terms of accuracy and efficiency.

CVJun 26, 2024
CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

Meiying Zhang, Weiyuan Peng, Guangyao Ding et al.

Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real),cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been developed to address cross-domain tasks between real-world datasets, progress in sim-to-real remains limited. This paper presents a novel Complex-to-Simple (CTS) framework to transfer models from labeled simulation (source) to unlabeled reality (target) domains. Based on a two-stage detector, the novelty of this work is threefold: 1) developing fixed-size anchor heads and RoI augmentation to address size bias and feature diversity between two domains, thereby improving the quality of pseudo-label; 2) developing a novel corner-format representation of aleatoric uncertainty (AU) for the bounding box, to uniformly quantify pseudo-label quality; 3) developing a noise-aware mean teacher domain adaptation method based on AU, as well as object-level and frame-level sampling strategies, to migrate the impact of noisy labels. Experimental results demonstrate that our proposed approach significantly enhances the sim-to-real domain adaptation capability of 3D object detection models, outperforming state-of-the-art cross-domain algorithms, which are usually developed for real-to-real UDA tasks.

ROJan 30, 2022
Robotic Wireless Energy Transfer in Dynamic Environments: System Design and Experimental Validation

Shuai Wang, Ruihua Han, Yuncong Hong et al.

Wireless energy transfer (WET) is a ground-breaking technology for cutting the last wire between mobile sensors and power grids in smart cities. Yet, WET only offers effective transmission of energy over a short distance. Robotic WET is an emerging paradigm that mounts the energy transmitter on a mobile robot and navigates the robot through different regions in a large area to charge remote energy harvesters. However, it is challenging to determine the robotic charging strategy in an unknown and dynamic environment due to the uncertainty of obstacles. This paper proposes a hardware-in-the-loop joint optimization framework that offers three distinctive features: 1) efficient model updates and re-optimization based on the last-round experimental data; 2) iterative refinement of the anchor list for adaptation to different environments; 3) verification of algorithms in a high-fidelity Gazebo simulator and a multi-robot testbed. Experimental results show that the proposed framework significantly saves the WET mission completion time while satisfying collision avoidance and energy harvesting constraints.

IRDec 12, 2021
Re-ranking With Constraints on Diversified Exposures for Homepage Recommender System

Qi Hao, Tianze Luo, Guangda Huzhang

The homepage recommendation on most E-commerce applications places items in a hierarchical manner, where different channels display items in different styles. Existing algorithms usually optimize the performance of a single channel. So designing the model to achieve the optimal recommendation list which maximize the Click-Through Rate (CTR) of whole homepage is a challenge problem. Other than the accuracy objective, display diversity on the homepage is also important since homogeneous display usually hurts user experience. In this paper, we propose a two-stage architecture of the homepage recommendation system. In the first stage, we develop efficient algorithms for recommending items to proper channels while maintaining diversity. The two methods can be combined: user-channel-item predictive model with diversity constraint. In the second stage, we provide an ordered list of items in each channel. Existing re-ranking models are hard to describe the mutual influence between items in both intra-channel and inter-channel. Therefore, we propose a Deep \& Hierarchical Attention Network Re-ranking (DHANR) model for homepage recommender systems. The Hierarchical Attention Network consists of an item encoder, an item-level attention layer, a channel encoder and a channel-level attention layer. Our method achieves a significant improvement in terms of precision, intra-list average distance(ILAD) and channel-wise Precision@k in offline experiments and in terms of CTR and ILAD in our online systems.

CVDec 10, 2021
Self-Ensemling for 3D Point Cloud Domain Adaption

Qing Li, Xiaojiang Peng, Chuan Yan et al.

Recently 3D point cloud learning has been a hot topic in computer vision and autonomous driving. Due to the fact that it is difficult to manually annotate a qualitative large-scale 3D point cloud dataset, unsupervised domain adaptation (UDA) is popular in 3D point cloud learning which aims to transfer the learned knowledge from the labeled source domain to the unlabeled target domain. However, the generalization and reconstruction errors caused by domain shift with simply-learned model are inevitable which substantially hinder the model's capability from learning good representations. To address these issues, we propose an end-to-end self-ensembling network (SEN) for 3D point cloud domain adaption tasks. Generally, our SEN resorts to the advantages of Mean Teacher and semi-supervised learning, and introduces a soft classification loss and a consistency loss, aiming to achieve consistent generalization and accurate reconstruction. In SEN, a student network is kept in a collaborative manner with supervised learning and self-supervised learning, and a teacher network conducts temporal consistency to learn useful representations and ensure the quality of point clouds reconstruction. Extensive experiments on several 3D point cloud UDA benchmarks show that our SEN outperforms the state-of-the-art methods on both classification and segmentation tasks. Moreover, further analysis demonstrates that our SEN also achieves better reconstruction results.

ROSep 28, 2021
Runtime Safety Assurance for Learning-enabled Control of Autonomous Driving Vehicles

Shengduo Chen, Yaowei Sun, Dachuan Li et al.

Providing safety guarantees for Autonomous Vehicle (AV) systems with machine-learning-based controllers remains a challenging issue. In this work, we propose Simplex-Drive, a framework that can achieve runtime safety assurance for machine-learning enabled controllers of AVs. The proposed Simplex-Drive consists of an unverified Deep Reinforcement Learning (DRL)-based advanced controller (AC) that achieves desirable performance in complex scenarios, a Velocity-Obstacle (VO) based baseline safe controller (BC) with provably safety guarantees, and a verified mode management unit that monitors the operation status and switches the control authority between AC and BC based on safety-related conditions. We provide a formal correctness proof of Simplex-Drive and conduct a lane-changing case study in dense traffic scenarios. The simulation experiment results demonstrate that Simplex-Drive can always ensure operation safety without sacrificing control performance, even if the DRL policy may lead to deviations from the safe status.

SPAug 31, 2021
Unit-Modulus Wireless Federated Learning Via Penalty Alternating Minimization

Shuai Wang, Dachuan Li, Rui Wang et al.

Wireless federated learning (FL) is an emerging machine learning paradigm that trains a global parametric model from distributed datasets via wireless communications. This paper proposes a unit-modulus wireless FL (UMWFL) framework, which simultaneously uploads local model parameters and computes global model parameters via optimized phase shifting. The proposed framework avoids sophisticated baseband signal processing, leading to both low communication delays and implementation costs. A training loss bound is derived and a penalty alternating minimization (PAM) algorithm is proposed to minimize the nonconvex nonsmooth loss bound. Experimental results in the Car Learning to Act (CARLA) platform show that the proposed UMWFL framework with PAM algorithm achieves smaller training losses and testing errors than those of the benchmark scheme.

ROAug 11, 2021
Capture Uncertainties in Deep Neural Networks for Safe Operation of Autonomous Driving Vehicles

Liuhui Ding, Dachuan Li, Bowen Liu et al.

Uncertainties in Deep Neural Network (DNN)-based perception and vehicle's motion pose challenges to the development of safe autonomous driving vehicles. In this paper, we propose a safe motion planning framework featuring the quantification and propagation of DNN-based perception uncertainties and motion uncertainties. Contributions of this work are twofold: (1) A Bayesian Deep Neural network model which detects 3D objects and quantitatively captures the associated aleatoric and epistemic uncertainties of DNNs; (2) An uncertainty-aware motion planning algorithm (PU-RRT) that accounts for uncertainties in object detection and ego-vehicle's motion. The proposed approaches are validated via simulated complex scenarios built in CARLA. Experimental results show that the proposed motion planning scheme can cope with uncertainties of DNN-based perception and vehicle motion, and improve the operational safety of autonomous vehicles while still achieving desirable efficiency.

CVAug 10, 2021
Joint Multi-Object Detection and Tracking with Camera-LiDAR Fusion for Autonomous Driving

Kemiao Huang, Qi Hao

Multi-object tracking (MOT) with camera-LiDAR fusion demands accurate results of object detection, affinity computation and data association in real time. This paper presents an efficient multi-modal MOT framework with online joint detection and tracking schemes and robust data association for autonomous driving applications. The novelty of this work includes: (1) development of an end-to-end deep neural network for joint object detection and correlation using 2D and 3D measurements; (2) development of a robust affinity computation module to compute occlusion-aware appearance and motion affinities in 3D space; (3) development of a comprehensive data association module for joint optimization among detection confidences, affinities and start-end probabilities. The experiment results on the KITTI tracking benchmark demonstrate the superior performance of the proposed method in terms of both tracking accuracy and processing speed.

CVMar 8, 2021
Unsupervised Person Re-Identification with Multi-Label Learning Guided Self-Paced Clustering

Qing Li, Xiaojiang Peng, Yu Qiao et al.

Although unsupervised person re-identification (Re-ID) has drawn increasing research attention recently, it remains challenging to learn discriminative features without annotations across disjoint camera views. In this paper, we address the unsupervised person Re-ID with a conceptually novel yet simple framework, termed as Multi-label Learning guided self-paced Clustering (MLC). MLC mainly learns discriminative features with three crucial modules, namely a multi-scale network, a multi-label learning module, and a self-paced clustering module. Specifically, the multi-scale network generates multi-granularity person features in both global and local views. The multi-label learning module leverages a memory feature bank and assigns each image with a multi-label vector based on the similarities between the image and feature bank. After multi-label training for several epochs, the self-paced clustering joins in training and assigns a pseudo label for each image. The benefits of our MLC come from three aspects: i) the multi-scale person features for better similarity measurement, ii) the multi-label assignment based on the whole dataset ensures that every image can be trained, and iii) the self-paced clustering removes some noisy samples for better feature learning. Extensive experiments on three popular large-scale Re-ID benchmarks demonstrate that our MLC outperforms previous state-of-the-art methods and significantly improves the performance of unsupervised person Re-ID.

LGMar 5, 2021
Distributed Dynamic Map Fusion via Federated Learning for Intelligent Networked Vehicles

Zijian Zhang, Shuai Wang, Yuncong Hong et al.

The technology of dynamic map fusion among networked vehicles has been developed to enlarge sensing ranges and improve sensing accuracies for individual vehicles. This paper proposes a federated learning (FL) based dynamic map fusion framework to achieve high map quality despite unknown numbers of objects in fields of view (FoVs), various sensing and model uncertainties, and missing data labels for online learning. The novelty of this work is threefold: (1) developing a three-stage fusion scheme to predict the number of objects effectively and to fuse multiple local maps with fidelity scores; (2) developing an FL algorithm which fine-tunes feature models (i.e., representation learning networks for feature extraction) distributively by aggregating model parameters; (3) developing a knowledge distillation method to generate FL training labels when data labels are unavailable. The proposed framework is implemented in the Car Learning to Act (CARLA) simulation platform. Extensive experimental results are provided to verify the superior performance and robustness of the developed map fusion and FL schemes.

ITJan 28, 2021
Edge Federated Learning Via Unit-Modulus Over-The-Air Computation

Shuai Wang, Yuncong Hong, Rui Wang et al.

Edge federated learning (FL) is an emerging paradigm that trains a global parametric model from distributed datasets based on wireless communications. This paper proposes a unit-modulus over-the-air computation (UMAirComp) framework to facilitate efficient edge federated learning, which simultaneously uploads local model parameters and updates global model parameters via analog beamforming. The proposed framework avoids sophisticated baseband signal processing, leading to low communication delays and implementation costs. Training loss bounds of UMAirComp FL systems are derived and two low-complexity large-scale optimization algorithms, termed penalty alternating minimization (PAM) and accelerated gradient projection (AGP), are proposed to minimize the nonconvex nonsmooth loss bound. Simulation results show that the proposed UMAirComp framework with PAM algorithm achieves a smaller mean square error of model parameters' estimation, training loss, and test error compared with other benchmark schemes. Moreover, the proposed UMAirComp framework with AGP algorithm achieves satisfactory performance while reduces the computational complexity by orders of magnitude compared with existing optimization algorithms. Finally, we demonstrate the implementation of UMAirComp in a vehicle-to-everything autonomous driving simulation platform. It is found that autonomous driving tasks are more sensitive to model parameter errors than other tasks since the neural networks for autonomous driving contain sparser model parameters.

ITOct 29, 2020
Learning Centric Wireless Resource Allocation for Edge Computing: Algorithm and Experiment

Liangkai Zhou, Yuncong Hong, Shuai Wang et al.

Edge intelligence is an emerging network architecture that integrates sensing, communication, computing components, and supports various machine learning applications, where a fundamental communication question is: how to allocate the limited wireless resources (such as time, energy) to the simultaneous model training of heterogeneous learning tasks? Existing methods ignore two important facts: 1) different models have heterogeneous demands on training data; 2) there is a mismatch between the simulated environment and the real-world environment. As a result, they could lead to low learning performance in practice. This paper proposes the learning centric wireless resource allocation (LCWRA) scheme that maximizes the worst learning performance of multiple tasks. Analysis shows that the optimal transmission time has an inverse power relationship with respect to the generalization error. Finally, both simulation and experimental results are provided to verify the performance of the proposed LCWRA scheme and its robustness in real implementation.

ITJul 21, 2020
Learning Centric Power Allocation for Edge Intelligence

Shuai Wang, Rui Wang, Qi Hao et al.

While machine-type communication (MTC) devices generate massive data, they often cannot process this data due to limited energy and computation power. To this end, edge intelligence has been proposed, which collects distributed data and performs machine learning at the edge. However, this paradigm needs to maximize the learning performance instead of the communication throughput, for which the celebrated water-filling and max-min fairness algorithms become inefficient since they allocate resources merely according to the quality of wireless channels. This paper proposes a learning centric power allocation (LCPA) method, which allocates radio resources based on an empirical classification error model. To get insights into LCPA, an asymptotic optimal solution is derived. The solution shows that the transmit powers are inversely proportional to the channel gain, and scale exponentially with the learning parameters. Experimental results show that the proposed LCPA algorithm significantly outperforms other power allocation algorithms.

RONov 19, 2018
Decentralized Cooperative Multi-Robot Localization with EKF

Ruihua Han, Shengduo Chen, Yasheng Bu et al.

Multi-robot localization has been a critical problem for robots performing complex tasks cooperatively. In this paper, we propose a decentralized approach to localize a group of robots in a large featureless environment. The proposed approach only requires that at least one robot remains stationary as a temporary landmark during a certain period of time. The novelty of our approach is threefold: (1) developing a decentralized scheme that each robot calculates their own state and only stores the latest one to reduce storage and computational cost, (2) developing an efficient localization algorithm through the extended Kalman filter (EKF) that only uses observations of relative pose to estimate the robot positions, (3) developing a scheme has less requirements on landmarks and more robustness against insufficient observations. Various simulations and experiments using five robots equipped with relative pose-measurement sensors are performed to validate the superior performance of our approach.