Junqiao Zhao

h-index20

25papers

219citations

Novelty50%

AI Score46

Ranked #40,006 of 194,257 authors (top 21%)#14,125 in CV (top 24%)

25 Papers

2.8CVAug 24, 2023

VNI-Net: Vector Neurons-based Rotation-Invariant Descriptor for LiDAR Place Recognition

Gengxuan Tian, Junqiao Zhao, Yingfeng Cai et al.

LiDAR-based place recognition plays a crucial role in Simultaneous Localization and Mapping (SLAM) and LiDAR localization. Despite the emergence of various deep learning-based and hand-crafting-based methods, rotation-induced place recognition failure remains a critical challenge. Existing studies address this limitation through specific training strategies or network structures. However, the former does not produce satisfactory results, while the latter focuses mainly on the reduced problem of SO(2) rotation invariance. Methods targeting SO(3) rotation invariance suffer from limitations in discrimination capability. In this paper, we propose a new method that employs Vector Neurons Network (VNN) to achieve SO(3) rotation invariance. We first extract rotation-equivariant features from neighboring points and map low-dimensional features to a high-dimensional space through VNN. Afterwards, we calculate the Euclidean and Cosine distance in the rotation-equivariant feature space as rotation-invariant feature descriptors. Finally, we aggregate the features using GeM pooling to obtain global descriptors. To address the significant information loss when formulating rotation-invariant descriptors, we propose computing distances between features at different layers within the Euclidean space neighborhood. This greatly improves the discriminability of the point cloud descriptors while ensuring computational efficiency. Experimental results on public datasets show that our approach significantly outperforms other baseline methods implementing rotation invariance, while achieving comparable results with current state-of-the-art place recognition methods that do not consider rotation issues.

10.7LGSep 22, 2023

How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization

Hai Zhang, Hang Yu, Junqiao Zhao et al.

Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model overfitting. Based on these, we develop a straightforward algorithm USB-PO (Unified model Shift and model Bias Policy Optimization). Empirical results show that USB-PO achieves state-of-the-art performance on several challenging benchmark tasks.

6.8CVMay 19, 2023Code

Learning Sequence Descriptor based on Spatio-Temporal Attention for Visual Place Recognition

Junqiao Zhao, Fenglin Zhang, Yingfeng Cai et al.

Visual Place Recognition (VPR) aims to retrieve frames from a geotagged database that are located at the same place as the query frame. To improve the robustness of VPR in perceptually aliasing scenarios, sequence-based VPR methods are proposed. These methods are either based on matching between frame sequences or extracting sequence descriptors for direct retrieval. However, the former is usually based on the assumption of constant velocity, which is difficult to hold in practice, and is computationally expensive and subject to sequence length. Although the latter overcomes these problems, existing sequence descriptors are constructed by aggregating features of multiple frames only, without interaction on temporal information, and thus cannot obtain descriptors with spatio-temporal discrimination.In this paper, we propose a sequence descriptor that effectively incorporates spatio-temporal information. Specifically, spatial attention within the same frame is utilized to learn spatial feature patterns, while attention in corresponding local regions of different frames is utilized to learn the persistence or change of features over time. We use a sliding window to control the temporal range of attention and use relative positional encoding to construct sequential relationships between different features. This allows our descriptors to capture the intrinsic dynamics in a sequence of frames.Comprehensive experiments on challenging benchmark datasets show that the proposed approach outperforms recent state-of-the-art methods.The code is available at https://github.com/tiev-tongji/Spatio-Temporal-SeqVPR.

3.7CVMay 30, 2021Code

Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from Video

Jianfeng Li, Junqiao Zhao, Shuangfu Song et al.

Estimating geometric elements such as depth, camera motion, and optical flow from images is an important part of the robot's visual perception. We use a joint self-supervised method to estimate the three geometric elements. Depth network, optical flow network and camera motion network are independent of each other but are jointly optimized during training phase. Compared with independent training, joint training can make full use of the geometric relationship between geometric elements and provide dynamic and static information of the scene. In this paper, we improve the joint self-supervision method from three aspects: network structure, dynamic object segmentation, and geometric constraints. In terms of network structure, we apply the attention mechanism to the camera motion network, which helps to take advantage of the similarity of camera movement between frames. And according to attention mechanism in Transformer, we propose a plug-and-play convolutional attention module. In terms of dynamic object, according to the different influences of dynamic objects in the optical flow self-supervised framework and the depth-pose self-supervised framework, we propose a threshold algorithm to detect dynamic regions, and mask that in the loss function respectively. In terms of geometric constraints, we use traditional methods to estimate the fundamental matrix from the corresponding points to constrain the camera motion network. We demonstrate the effectiveness of our method on the KITTI dataset. Compared with other joint self-supervised methods, our method achieves state-of-the-art performance in the estimation of pose and optical flow, and the depth estimation has also achieved competitive results. Code will be available https://github.com/jianfenglihg/Unsupervised_geometry.

5.0CVMar 4, 2020Code

Occlusion Aware Unsupervised Learning of Optical Flow From Video

Jianfeng Li, Junqiao Zhao, Tiantian Feng et al.

In this paper, we proposed an unsupervised learning method for estimating the optical flow between video frames, especially to solve the occlusion problem. Occlusion is caused by the movement of an object or the movement of the camera, defined as when certain pixels are visible in one video frame but not in adjacent frames. Due to the lack of pixel correspondence between frames in the occluded area, incorrect photometric loss calculation can mislead the optical flow training process. In the video sequence, we found that the occlusion in the forward ($t\rightarrow t+1$) and backward ($t\rightarrow t-1$) frame pairs are usually complementary. That is, pixels that are occluded in subsequent frames are often not occluded in the previous frame and vice versa. Therefore, by using this complementarity, a new weighted loss is proposed to solve the occlusion problem. In addition, we calculate gradients in multiple directions to provide richer supervision information. Our method achieves competitive optical flow accuracy compared to the baseline and some supervised methods on KITTI 2012 and 2015 benchmarks. This source code has been released at https://github.com/jianfenglihg/UnOpticalFlow.git.

18.8LGFeb 4, 2024Code

Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning

Lanqing Li, Hai Zhang, Xinyu Zhang et al.

As a marriage between offline RL and meta-RL, the advent of offline meta-reinforcement learning (OMRL) has shown great promise in enabling RL agents to multi-task and quickly adapt while acquiring knowledge safely. Among which, context-based OMRL (COMRL) as a popular paradigm, aims to learn a universal policy conditioned on effective task representations. In this work, by examining several key milestones in the field of COMRL, we propose to integrate these seemingly independent methodologies into a unified framework. Most importantly, we show that the pre-existing COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $M$ and its latent representation $Z$ by implementing various approximate bounds. Such theoretical insight offers ample design freedom for novel algorithms. As demonstrations, we propose a supervised and a self-supervised implementation of $I(Z; M)$, and empirically show that the corresponding optimization algorithms exhibit remarkable generalization across a broad spectrum of RL benchmarks, context shift scenarios, data qualities and deep learning architectures. This work lays the information theoretic foundation for COMRL methods, leading to a better understanding of task representation learning in the context of reinforcement learning. Given its generality, we envision our framework as a promising offline pre-training paradigm of foundation models for decision making.

6.2CVMay 12, 2025

Ranking-aware Continual Learning for LiDAR Place Recognition

Xufei Wang, Gengxuan Tian, Junqiao Zhao et al.

Place recognition plays a significant role in SLAM, robot navigation, and autonomous driving applications. Benefiting from deep learning, the performance of LiDAR place recognition (LPR) has been greatly improved. However, many existing learning-based LPR methods suffer from catastrophic forgetting, which severely harms the performance of LPR on previously trained places after training on a new environment. In this paper, we introduce a continual learning framework for LPR via Knowledge Distillation and Fusion (KDF) to alleviate forgetting. Inspired by the ranking process of place recognition retrieval, we present a ranking-aware knowledge distillation loss that encourages the network to preserve the high-level place recognition knowledge. We also introduce a knowledge fusion module to integrate the knowledge of old and new models for LiDAR place recognition. Our extensive experiments demonstrate that KDF can be applied to different networks to overcome catastrophic forgetting, surpassing the state-of-the-art methods in terms of mean Recall@1 and forgetting score.

3.6CVJul 22, 2025

LDRFusion: A LiDAR-Dominant multimodal refinement framework for 3D object detection

Jijun Wang, Yan Wu, Yujian Mo et al.

Existing LiDAR-Camera fusion methods have achieved strong results in 3D object detection. To address the sparsity of point clouds, previous approaches typically construct spatial pseudo point clouds via depth completion as auxiliary input and adopts a proposal-refinement framework to generate detection results. However, introducing pseudo points inevitably brings noise, potentially resulting in inaccurate predictions. Considering the differing roles and reliability levels of each modality, we propose LDRFusion, a novel Lidar-dominant two-stage refinement framework for multi-sensor fusion. The first stage soley relies on LiDAR to produce accurately localized proposals, followed by a second stage where pseudo point clouds are incorporated to detect challenging instances. The instance-level results from both stages are subsequently merged. To further enhance the representation of local structures in pseudo point clouds, we present a hierarchical pseudo point residual encoding module, which encodes neighborhood sets using both feature and positional residuals. Experiments on the KITTI dataset demonstrate that our framework consistently achieves strong performance across multiple categories and difficulty levels.

8.4CVJul 18, 2025

Enhancing LiDAR Point Features with Foundation Model Priors for 3D Object Detection

Yujian Mo, Yan Wu, Junqiao Zhao et al.

Recent advances in foundation models have opened up new possibilities for enhancing 3D perception. In particular, DepthAnything offers dense and reliable geometric priors from monocular RGB images, which can complement sparse LiDAR data in autonomous driving scenarios. However, such priors remain underutilized in LiDAR-based 3D object detection. In this paper, we address the limited expressiveness of raw LiDAR point features, especially the weak discriminative capability of the reflectance attribute, by introducing depth priors predicted by DepthAnything. These priors are fused with the original LiDAR attributes to enrich each point's representation. To leverage the enhanced point features, we propose a point-wise feature extraction module. Then, a Dual-Path RoI feature extraction framework is employed, comprising a voxel-based branch for global semantic context and a point-based branch for fine-grained structural details. To effectively integrate the complementary RoI features, we introduce a bidirectional gated RoI feature fusion module that balances global and local cues. Extensive experiments on the KITTI benchmark show that our method consistently improves detection accuracy, demonstrating the value of incorporating visual foundation model priors into LiDAR-based 3D object detection.

3.6CVSep 6, 2025

Multi-LVI-SAM: A Robust LiDAR-Visual-Inertial Odometry for Multiple Fisheye Cameras

Xinyu Zhang, Kai Huang, Junqiao Zhao et al.

We propose a multi-camera LiDAR-visual-inertial odometry framework, Multi-LVI-SAM, which fuses data from multiple fisheye cameras, LiDAR and inertial sensors for highly accurate and robust state estimation. To enable efficient and consistent integration of visual information from multiple fisheye cameras, we introduce a panoramic visual feature model that unifies multi-camera observations into a single representation. The panoramic model serves as a global geometric optimization framework that consolidates multi-view constraints, enabling seamless loop closure and global pose optimization, while simplifying system design by avoiding redundant handling of individual cameras. To address the triangulation inconsistency caused by the misalignment between each camera's frame and the panoramic model's frame, we propose an extrinsic compensation method. This method improves feature consistency across views and significantly reduces triangulation and optimization errors, leading to more accurate pose estimation. We integrate the panoramic visual feature model into a tightly coupled LiDAR-visual-inertial system based on a factor graph. Extensive experiments on public datasets demonstrate that the panoramic visual feature model enhances the quality and consistency of multi-camera constraints, resulting in higher accuracy and robustness than existing multi-camera LiDAR-visual-inertial systems.

4.6LGMay 20, 2024Code

Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning

Hai Zhang, Boyuan Zheng, Tianying Ji et al.

Offline meta reinforcement learning (OMRL) has emerged as a promising approach for interaction avoidance and strong generalization performance by leveraging pre-collected data and meta-learning techniques. Previous context-based approaches predominantly rely on the intuition that alternating optimization between the context encoder and the policy can lead to performance improvements, as long as the context encoder follows the principle of maximizing the mutual information between the task variable $M$ and its latent representation $Z$ ($I(Z;M)$) while the policy adopts the standard offline reinforcement learning (RL) algorithms conditioning on the learned task representation.Despite promising results, the theoretical justification of performance improvements for such intuition remains underexplored.Inspired by the return discrepancy scheme in the model-based RL field, we find that the previous optimization framework can be linked with the general RL objective of maximizing the expected return, thereby explaining performance improvements. Furthermore, after scrutinizing this optimization framework, we observe that the condition for monotonic performance improvements does not consider the variation of the task representation. When these variations are considered, the previously established condition may no longer be sufficient to ensure monotonicity, thereby impairing the optimization process.We name this issue task representation shift and theoretically prove that the monotonic performance improvements can be guaranteed with appropriate context encoder updates.Our work opens up a new avenue for OMRL, leading to a better understanding between the task representation and performance improvements.

2.6LGMay 13, 2024

POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning

Chang Huang, Shatong Zhu, Junqiao Zhao et al.

Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning, with QMIX receiving significant attention. Many QMIX-based methods introduce monotonicity constraints between the joint action value and individual action values to achieve decentralized execution. However, such constraints limit the representation capacity of value factorization, restricting the joint action values it can represent and hindering the learning of the optimal policy. To address this challenge, we propose the Potentially Optimal Joint Actions Weighted QMIX (POWQMIX) algorithm, which recognizes the potentially optimal joint actions and assigns higher weights to the corresponding losses of these joint actions during training. We theoretically prove that with such a weighted training approach the optimal policy is guaranteed to be recovered. Experiments in matrix games, difficulty-enhanced predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.

2.2ROFeb 23, 2022

DL-SLOT: Dynamic Lidar SLAM and Object Tracking Based On Graph Optimization

Xuebo Tian, Junqiao Zhao, Chen Ye

Ego-pose estimation and dynamic object tracking are two key issues in an autonomous driving system. Two assumptions are often made for them, i.e. the static world assumption of simultaneous localization and mapping (SLAM) and the exact ego-pose assumption of object tracking, respectively. However, these assumptions are difficult to hold in highly dynamic road scenarios where SLAM and object tracking become correlated and mutually beneficial. In this paper, DL-SLOT, a dynamic Lidar SLAM and object tracking method is proposed. This method integrates the state estimations of both the ego vehicle and the static and dynamic objects in the environment into a unified optimization framework, to realize SLAM and object tracking (SLOT) simultaneously. Firstly, we implement object detection to remove all the points that belong to potential dynamic objects. Then, LiDAR odometry is conducted using the filtered point cloud. At the same time, detected objects are associated with the history object trajectories based on the time-series information in a sliding window. The states of the static and dynamic objects and ego vehicle in the sliding window are integrated into a unified local optimization framework. We perform SLAM and object tracking simultaneously in this framework, which significantly improves the robustness and accuracy of SLAM in highly dynamic road scenarios and the accuracy of objects' states estimation. Experiments on public datasets have shown that our method achieves better accuracy than A-LOAM.

2.6CVFeb 11, 2022

Patch-NetVLAD+: Learned patch descriptor and weighted matching strategy for place recognition

Yingfeng Cai, Junqiao Zhao, Jiafeng Cui et al.

Visual Place Recognition (VPR) in areas with similar scenes such as urban or indoor scenarios is a major challenge. Existing VPR methods using global descriptors have difficulty capturing local specific regions (LSR) in the scene and are therefore prone to localization confusion in such scenarios. As a result, finding the LSR that are critical for location recognition becomes key. To address this challenge, we introduced Patch-NetVLAD+, which was inspired by patch-based VPR researches. Our method proposed a fine-tuning strategy with triplet loss to make NetVLAD suitable for extracting patch-level descriptors. Moreover, unlike existing methods that treat all patches in an image equally, our method extracts patches of LSR, which present less frequently throughout the dataset, and makes them play an important role in VPR by assigning proper weights to them. Experiments on Pittsburgh30k and Tokyo247 datasets show that our approach achieved up to 6.35\% performance improvement than existing patch-based methods.

4.0ROFeb 10, 2022

Scale Estimation with Dual Quadrics for Monocular Object SLAM

Shuangfu Song, Junqiao Zhao, Tiantian Feng et al.

The scale ambiguity problem is inherently unsolvable to monocular SLAM without the metric baseline between moving cameras. In this paper, we present a novel scale estimation approach based on an object-level SLAM system. To obtain the absolute scale of the reconstructed map, we derive a nonlinear optimization method to make the scaled dimensions of objects conforming to the distribution of their sizes in the physical world, without relying on any prior information of gravity direction. We adopt the dual quadric to represent objects for its ability to fit objects compactly and accurately. In the proposed monocular object-level SLAM system, dual quadrics are fastly initialized based on constraints of 2-D detections and fitted oriented bounding box and are further optimized to provide reliable dimensions for scale estimation.

1.4CVNov 27, 2021

DSC: Deep Scan Context Descriptor for Large-Scale Place Recognition

Jiafeng Cui, Tengfei Huang, Yingfeng Cai et al.

LiDAR-based place recognition is an essential and challenging task both in loop closure detection and global relocalization. We propose Deep Scan Context (DSC), a general and discriminative global descriptor that captures the relationship among segments of a point cloud. Unlike previous methods that utilize either semantics or a sequence of adjacent point clouds for better place recognition, we only use raw point clouds to get competitive results. Concretely, we first segment the point cloud egocentrically to acquire centroids and eigenvalues of the segments. Then, we introduce a graph neural network to aggregate these features into an embedding representation. Extensive experiments conducted on the KITTI dataset show that DSC is robust to scene variants and outperforms existing methods.

1.2CVOct 21, 2020

Dense Dual-Path Network for Real-time Semantic Segmentation

Xinneng Yang, Yan Wu, Junqiao Zhao et al.

Semantic segmentation has achieved remarkable results with high computational cost and a large number of parameters. However, real-world applications require efficient inference speed on embedded devices. Most previous works address the challenge by reducing depth, width and layer capacity of network, which leads to poor performance. In this paper, we introduce a novel Dense Dual-Path Network (DDPNet) for real-time semantic segmentation under resource constraints. We design a light-weight and powerful backbone with dense connectivity to facilitate feature reuse throughout the whole network and the proposed Dual-Path module (DPM) to sufficiently aggregate multi-scale contexts. Meanwhile, a simple and effective framework is built with a skip architecture utilizing the high-resolution feature maps to refine the segmentation output and an upsampling module leveraging context information from the feature maps to refine the heatmaps. The proposed DDPNet shows an obvious advantage in balancing accuracy and speed. Specifically, on Cityscapes test dataset, DDPNet achieves 75.3% mIoU with 52.6 FPS for an input of 1024 X 2048 resolution on a single GTX 1080Ti card. Compared with other state-of-the-art methods, DDPNet achieves a significant better accuracy with a comparable speed and fewer parameters.

2.3CVMar 19, 2020

Detecting Lane and Road Markings at A Distance with Perspective Transformer Layers

Zhuoping Yu, Xiaozhou Ren, Yuyao Huang et al.

Accurate detection of lane and road markings is a task of great importance for intelligent vehicles. In existing approaches, the detection accuracy often degrades with the increasing distance. This is due to the fact that distant lane and road markings occupy a small number of pixels in the image, and scales of lane and road markings are inconsistent at various distances and perspectives. The Inverse Perspective Mapping (IPM) can be used to eliminate the perspective distortion, but the inherent interpolation can lead to artifacts especially around distant lane and road markings and thus has a negative impact on the accuracy of lane marking detection and segmentation. To solve this problem, we adopt the Encoder-Decoder architecture in Fully Convolutional Networks and leverage the idea of Spatial Transformer Networks to introduce a novel semantic segmentation neural network. This approach decomposes the IPM process into multiple consecutive differentiable homographic transform layers, which are called "Perspective Transformer Layers". Furthermore, the interpolated feature map is refined by subsequent convolutional layers thus reducing the artifacts and improving the accuracy. The effectiveness of the proposed method in lane marking detection is validated on two public datasets: TuSimple and ApolloScape

2.0AISep 17, 2019

Real-time Multi-target Path Prediction and Planning for Autonomous Driving aided by FCN

Hongtu Zhou, Xinneng Yang, Enwei Zhang et al.

Real-time multi-target path planning is a key issue in the field of autonomous driving. Although multiple paths can be generated in real-time with polynomial curves, the generated paths are not flexible enough to deal with complex road scenes such as S-shaped road and unstructured scenes such as parking lots. Search and sampling-based methods, such as A* and RRT and their derived methods, are flexible in generating paths for these complex road environments. However, the existing algorithms require significant time to plan to multiple targets, which greatly limits their application in autonomous driving. In this paper, a real-time path planning method for multi-targets is proposed. We train a fully convolutional neural network (FCN) to predict a path region for the target at first. By taking the predicted path region as soft constraints, the A* algorithm is then applied to search the exact path to the target. Experiments show that FCN can make multiple predictions in a very short time (50 times in 40ms), and the predicted path region effectively restrict the searching space for the following A* search. Therefore, the A* can search much faster so that the multi-target path planning can be achieved in real-time (3 targets in less than 100ms).

1.7CVSep 26, 2018

Vision-based Semantic Mapping and Localization for Autonomous Indoor Parking

Yewei Huang, Junqiao Zhao, Xudong He et al.

In this paper, we proposed a novel and practical solution for the real-time indoor localization of autonomous driving in parking lots. High-level landmarks, the parking slots, are extracted and enriched with labels to avoid the aliasing of low-level visual features. We then proposed a robust method for detecting incorrect data associations between parking slots and further extended the optimization framework by dynamically eliminating suboptimal data associations. Visual fiducial markers are introduced to improve the overall precision. As a result, a semantic map of the parking lot can be established fully automatically and robustly. We experimented the performance of real-time localization based on the map using our autonomous driving platform TiEV, and the average accuracy of 0.3m track tracing can be achieved at a speed of 10kph.

1.7CVSep 15, 2018

DLO: Direct LiDAR Odometry for 2.5D Outdoor Environment

Lu Sun, Junqiao Zhao, Xudong He et al.

For autonomous vehicles, high-precision real-time localization is the guarantee of stable driving. Compared with the visual odometry (VO), the LiDAR odometry (LO) has the advantages of higher accuracy and better stability. However, 2D LO is only suitable for the indoor environment, and 3D LO has less efficiency in general. Both are not suitable for the online localization of an autonomous vehicle in an outdoor driving environment. In this paper, a direct LO method based on the 2.5D grid map is proposed. The fast semi-dense direct method proposed for VO is employed to register two 2.5D maps. Experiments show that this method is superior to both the 3D-NDT and LOAM in the outdoor environment.

5.3ROApr 19, 2018

Automatic Vector-based Road Structure Mapping Using Multi-beam LiDAR

Xudong He, Junqiao Zhao, Lu Sun et al.

In this paper, we studied a SLAM method for vector-based road structure mapping using multi-beam LiDAR. We propose to use the polyline as the primary mapping element instead of grid cell or point cloud, because the vector-based representation is precise and lightweight, and it can directly generate vector-based High-Definition (HD) driving map as demanded by autonomous driving systems. We explored: 1) the extraction and vectorization of road structures based on local probabilistic fusion. 2) the efficient vector-based matching between frames of road structures. 3) the loop closure and optimization based on the pose-graph. In this study, we took a specific road structure, the road boundary, as an example. We applied the proposed matching method in three different scenes and achieved the average absolute matching error of 0.07. We further applied the mapping system to the urban road with the length of 860 meters and achieved an average global accuracy of 0.466 m without the help of high precision GPS.

5.2CVApr 19, 2018

VH-HFCN based Parking Slot and Lane Markings Segmentation on Panoramic Surround View

Yan Wu, Tao Yang, Junqiao Zhao et al.

The automatic parking is being massively developed by car manufacturers and providers. Until now, there are two problems with the automatic parking. First, there is no openly-available segmentation labels of parking slot on panoramic surround view (PSV) dataset. Second, how to detect parking slot and road structure robustly. Therefore, in this paper, we build up a public PSV dataset. At the same time, we proposed a highly fused convolutional network (HFCN) based segmentation method for parking slot and lane markings based on the PSV dataset. A surround-view image is made of four calibrated images captured from four fisheye cameras. We collect and label more than 4,200 surround view images for this task, which contain various illuminated scenes of different types of parking slots. A VH-HFCN network is proposed, which adopts an HFCN as the base, with an extra efficient VH-stage for better segmenting various markings. The VH-stage consists of two independent linear convolution paths with vertical and horizontal convolution kernels respectively. This modification enables the network to robustly and precisely extract linear features. We evaluated our model on the PSV dataset and the results showed outstanding performance in ground markings segmentation. Based on the segmented markings, parking slots and lanes are acquired by skeletonization, hough line transform and line arrangement.

2.9ROApr 17, 2018

TiEV: The Tongji Intelligent Electric Vehicle in the Intelligent Vehicle Future Challenge of China

Junqiao Zhao, Chen Ye, Yan Wu et al.

TiEV is an autonomous driving platform implemented by Tongji University of China. The vehicle is drive-by-wire and is fully powered by electricity. We devised the software system of TiEV from scratch, which is capable of driving the vehicle autonomously in urban paths as well as on fast express roads. We describe our whole system, especially novel modules of probabilistic perception fusion, incremental mapping, the 1st and the 2nd planning and the overall safety concern. TiEV finished 2016 and 2017 Intelligent Vehicle Future Challenge of China held at Changshu. We show our experiences on the development of autonomous vehicles and future trends.

5.8CVJan 4, 2018

Semantic Segmentation via Highly Fused Convolutional Network with Multiple Soft Cost Functions

Tao Yang, Yan Wu, Junqiao Zhao et al.

Semantic image segmentation is one of the most challenged tasks in computer vision. In this paper, we propose a highly fused convolutional network, which consists of three parts: feature downsampling, combined feature upsampling and multiple predictions. We adopt a strategy of multiple steps of upsampling and combined feature maps in pooling layers with its corresponding unpooling layers. Then we bring out multiple pre-outputs, each pre-output is generated from an unpooling layer by one-step upsampling. Finally, we concatenate these pre-outputs to get the final output. As a result, our proposed network makes highly use of the feature information by fusing and reusing feature maps. In addition, when training our model, we add multiple soft cost functions on pre-outputs and final outputs. In this way, we can reduce the loss reduction when the loss is back propagated. We evaluate our model on three major segmentation datasets: CamVid, PASCAL VOC and ADE20K. We achieve a state-of-the-art performance on CamVid dataset, as well as considerable improvements on PASCAL VOC dataset and ADE20K dataset