COJul 4, 2023
Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxiesMehdi Rezaie, Ashley J. Ross, Hee-Jong Seo et al.
We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter $\fnl$. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range $0.2< z < 1.35$. We identify Galactic extinction, survey depth, and astronomical seeing as the primary sources of systematic error, and employ linear regression and artificial neural networks to alleviate non-cosmological excess clustering on large scales. Our methods are tested against simulations with and without $\fnl$ and systematics, showing superior performance of the neural network treatment. The neural network with a set of nine imaging property maps passes our systematic null test criteria, and is chosen as the fiducial treatment. Assuming the universality relation, we find $\fnl = 34^{+24(+50)}_{-44(-73)}$ at 68\%(95\%) confidence. We apply a series of robustness tests (e.g., cuts on imaging, declination, or scales used) that show consistency in the obtained constraints. We study how the regression method biases the measured angular power-spectrum and degrades the $\fnl$ constraining power. The use of the nine maps more than doubles the uncertainty compared to using only the three primary maps in the regression. Our results thus motivate the development of more efficient methods that avoid over-correction, protect large-scale clustering information, and preserve constraining power. Additionally, our results encourage further studies of $\fnl$ with DESI spectroscopic samples, where the inclusion of 3D clustering modes should help separate imaging systematics and lessen the degradation in the $\fnl$ uncertainty.
CVApr 30, 2022Code
Unsupervised Visible-light Images Guided Cross-Spectrum Depth Estimation from Dual-Modality CamerasYubin Guo, Haobo Jiang, Xinlei Qi et al.
Cross-spectrum depth estimation aims to provide a depth map in all illumination conditions with a pair of dual-spectrum images. It is valuable for autonomous vehicle applications when the vehicle is equipped with two cameras of different modalities. However, images captured by different-modality cameras can be photometrically quite different. Therefore, cross-spectrum depth estimation is a very challenging problem. Moreover, the shortage of large-scale open-source datasets also retards further research in this field. In this paper, we propose an unsupervised visible-light image guided cross-spectrum (i.e., thermal and visible-light, TIR-VIS in short) depth estimation framework given a pair of RGB and thermal images captured from a visible-light camera and a thermal one. We first adopt a base depth estimation network using RGB-image pairs. Then we propose a multi-scale feature transfer network to transfer features from the TIR-VIS domain to the VIS domain at the feature level to fit the trained depth estimation network. At last, we propose a cross-spectrum depth cycle consistency to improve the depth result of dual-spectrum image pairs. Meanwhile, we release a large dual-spectrum depth estimation dataset with visible-light and far-infrared stereo images captured in different scenes to the society. The experiment result shows that our method achieves better performance than the compared existing methods. Our datasets is available at https://github.com/whitecrow1027/VIS-TIR-Datasets.
OCMay 4, 2016
Invariant Clusters for Hybrid SystemsHui Kong, Sergiy Bogomolov, Christian Schilling et al.
In this paper, we propose an approach to automatically compute invariant clusters for semialgebraic hybrid systems. An invariant cluster for an ordinary differential equation (ODE) is a multivariate polynomial invariant g(u,x)=0, parametric in u, which can yield an infinite number of concrete invariants by assigning different values to u so that every trajectory of the system can be overapproximated precisely by a union of concrete invariants. For semialgebraic systems, which involve ODEs with multivariate polynomial vector flow, invariant clusters can be obtained by first computing the remainder of the Lie derivative of a template multivariate polynomial w.r.t. its Groebner basis and then solving the system of polynomial equations obtained from the coefficients of the remainder. Based on invariant clusters and sum-of-squares (SOS) programming, we present a new method for the safety verification of hybrid systems. Experiments on nonlinear benchmark systems from biology and control theory show that our approach is effective and efficient.
CVMar 2, 2022
Learning Moving-Object Tracking with FMCW LiDARYi Gu, Hongzhi Cheng, Kafeng Wang et al.
In this paper, we propose a learning-based moving-object tracking method utilizing our newly developed LiDAR sensor, Frequency Modulated Continuous Wave (FMCW) LiDAR. Compared with most existing commercial LiDAR sensors, our FMCW LiDAR can provide additional Doppler velocity information to each 3D point of the point clouds. Benefiting from this, we can generate instance labels as ground truth in a semi-automatic manner. Given the labels, we propose a contrastive learning framework, which pulls together the features from the same instance in embedding space and pushes apart the features from different instances, to improve the tracking quality. Extensive experiments are conducted on our recorded driving data, and the results show that our method outperforms the baseline methods by a large margin.
CVSep 13, 2024Code
Pathfinder for Low-altitude Aircraft with Binary Neural NetworkKaijie Yin, Tian Gao, Hui Kong
A prior global topological map (e.g., the OpenStreetMap, OSM) can boost the performance of autonomous mapping by a ground mobile robot. However, the prior map is usually incomplete due to lacking labeling in partial paths. To solve this problem, this paper proposes an OSM maker using airborne sensors carried by low-altitude aircraft, where the core of the OSM maker is a novel efficient pathfinder approach based on LiDAR and camera data, i.e., a binary dual-stream road segmentation model. Specifically, a multi-scale feature extraction based on the UNet architecture is implemented for images and point clouds. To reduce the effect caused by the sparsity of point cloud, an attention-guided gated block is designed to integrate image and point-cloud features. To optimize the model for edge deployment that significantly reduces storage footprint and computational demands, we propose a binarization streamline to each model component, including a variant of vision transformer (ViT) architecture as the encoder of the image branch, and new focal and perception losses to optimize the model training. The experimental results on two datasets demonstrate that our pathfinder method achieves SOTA accuracy with high efficiency in finding paths from the low-level airborne sensors, and we can create complete OSM prior maps based on the segmented road skeletons. Code and data are available at: \href{https://github.com/IMRL/Pathfinder}{https://github.com/IMRL/Pathfinder}.
CVMay 6, 2022
Semantics-Guided Moving Object Segmentation with 3D LiDARShuo Gu, Suling Yao, Jian Yang et al.
Moving object segmentation (MOS) is a task to distinguish moving objects, e.g., moving vehicles and pedestrians, from the surrounding static environment. The segmentation accuracy of MOS can have an influence on odometry, map construction, and planning tasks. In this paper, we propose a semantics-guided convolutional neural network for moving object segmentation. The network takes sequential LiDAR range images as inputs. Instead of segmenting the moving objects directly, the network conducts single-scan-based semantic segmentation and multiple-scan-based moving object segmentation in turn. The semantic segmentation module provides semantic priors for the MOS module, where we propose an adjacent scan association (ASA) module to convert the semantic features of adjacent scans into the same coordinate system to fully exploit the cross-scan semantic features. Finally, by analyzing the difference between the transformed features, reliable MOS result can be obtained quickly. Experimental results on the SemanticKITTI MOS dataset proves the effectiveness of our work.
CVApr 12, 2022
HiTPR: Hierarchical Transformer for Place Recognition in Point CloudZhixing Hou, Yan Yan, Chengzhong Xu et al.
Place recognition or loop closure detection is one of the core components in a full SLAM system. In this paper, aiming at strengthening the relevancy of local neighboring points and the contextual dependency among global points simultaneously, we investigate the exploitation of transformer-based network for feature extraction, and propose a Hierarchical Transformer for Place Recognition (HiTPR). The HiTPR consists of four major parts: point cell generation, short-range transformer (SRT), long-range transformer (LRT) and global descriptor aggregation. Specifically, the point cloud is initially divided into a sequence of small cells by downsampling and nearest neighbors searching. In the SRT, we extract the local feature for each point cell. While in the LRT, we build the global dependency among all of the point cells in the whole point cloud. Experiments on several standard benchmarks demonstrate the superiority of the HiTPR in terms of average recall rate, achieving 93.71% at top 1% and 86.63% at top 1 on the Oxford RobotCar dataset for example.
CVJun 24, 2022
MaskRange: A Mask-classification Model for Range-view based LiDAR SegmentationYi Gu, Yuming Huang, Chengzhong Xu et al.
Range-view based LiDAR segmentation methods are attractive for practical applications due to their direct inheritance from efficient 2D CNN architectures. In literature, most range-view based methods follow the per-pixel classification paradigm. Recently, in the image segmentation domain, another paradigm formulates segmentation as a mask-classification problem and has achieved remarkable performance. This raises an interesting question: can the mask-classification paradigm benefit the range-view based LiDAR segmentation and achieve better performance than the counterpart per-pixel paradigm? To answer this question, we propose a unified mask-classification model, MaskRange, for the range-view based LiDAR semantic and panoptic segmentation. Along with the new paradigm, we also propose a novel data augmentation method to deal with overfitting, context-reliance, and class-imbalance problems. Extensive experiments are conducted on the SemanticKITTI benchmark. Among all published range-view based methods, our MaskRange achieves state-of-the-art performance with $66.10$ mIoU on semantic segmentation and promising results with $53.10$ PQ on panoptic segmentation with high efficiency. Our code will be released.
89.7IRMar 25
OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search FrameworkBen Chen, Siyuan Wang, Yufei Ma et al.
Generative Retrieval (GR) has emerged as a promising paradigm for modern search systems. Compared to multi-stage cascaded architecture, it offers advantages such as end-to-end joint optimization and high computational efficiency. OneSearch, as a representative industrial-scale deployed generative search framework, has brought significant commercial and operational benefits. However, its inadequate understanding of complex queries, inefficient exploitation of latent user intents, and overfitting to narrow historical preferences have limited its further performance improvement. To address these challenges, we propose \textbf{OneSearch-V2}, a latent reasoning enhanced self-distillation generative search framework. It contains three key innovations: (1) a thought-augmented complex query understanding module, which enables deep query understanding and overcomes the shallow semantic matching limitations of direct inference; (2) a reasoning-internalized self-distillation training pipeline, which uncovers users' potential yet precise e-commerce intentions beyond log-fitting through implicit in-context learning; (3) a behavior preference alignment optimization system, which mitigates reward hacking arising from the single conversion metric, and addresses personal preference via direct user feedback. Extensive offline evaluations demonstrate OneSearch-V2's strong query recognition and user profiling capabilities. Online A/B tests further validate its business effectiveness, yielding +3.98\% item CTR, +3.05\% buyer conversion rate, and +2.11\% order volume. Manual evaluation further confirms gains in search experience quality, with +1.65\% in page good rate and +1.37\% in query-item relevance. More importantly, OneSearch-V2 effectively mitigates common search system issues such as information bubbles and long-tail sparsity, without incurring additional inference costs or serving latency.
54.3ROMar 19
REST: Receding Horizon Explorative Steiner Tree for Zero-Shot Object-Goal NavigationShuqi Xiao, Maani Ghaffari, Chengzhong Xu et al.
Zero-shot object-goal navigation (ZSON) requires navigating unknown environments to find a target object without task-specific training. Prior hierarchical training-free solutions invest in scene understanding (\textit{belief}) and high-level decision-making (\textit{policy}), yet overlook the design of \textit{option}, i.e., a subgoal candidate proposed from evolving belief and presented to policy for selection. In practice, options are reduced to isolated waypoints scored independently: single destinations hide the value gathered along the journey; an unstructured collection obscures the relationships among candidates. Our insight is that the option space should be a \textit{tree of paths}. Full paths expose en-route information gain that destination-only scoring systematically neglects; a tree of shared segments enables coarse-to-fine LLM reasoning that dismisses or pursues entire branches before examining individual leaves, compressing the combinatorial path space into an efficient hierarchy. We instantiate this insight in \textbf{REST} (Receding Horizon Explorative Steiner Tree), a training-free framework that (1) builds an explicit open-vocabulary 3D map from online RGB-D streams; (2) grows an agent-centric tree of safe and informative paths as the option space via sampling-based planning; and (3) textualizes each branch into a spatial narrative and selects the next-best path through chain-of-thought LLM reasoning. Across the Gibson, HM3D, and HSSD benchmarks, REST consistently ranks among the top methods in success rate while achieving the best or second-best path efficiency, demonstrating a favorable efficiency-success balance.
65.6ROMar 17
SignNav: Leveraging Signage for Semantic Visual Navigation in Large-Scale Indoor EnvironmentsJian Sun, Yuming Huang, He Li et al.
Humans routinely leverage semantic hints provided by signage to navigate to destinations within novel Large-Scale Indoor (LSI) environments, such as hospitals and airport terminals. However, this capability remains underexplored within the field of embodied navigation. This paper introduces a novel embodied navigation task, SignNav, which requires the agent to interpret semantic hint from signage and reason about the subsequent action based on current observation. To facilitate research in this domain, we construct the LSI-Dataset for the training and evaluation of various SignNav agents. Dynamically changing semantic hints and sparse placement of signage in LSI environments present significant challenges to the SignNav task. To address these challenges, we propose the Spatial-Temporal Aware Transformer (START) model for end-to-end decision-making. The spatial-aware module grounds the semantic hint of signage into physical world, while the temporal-aware module captures long-range dependencies between historical states and current observation. Leveraging a two-stage training strategy with Dataset Aggregation (DAgger), our approach achieves state-of-the-art performance, recording an 80% Success Rate (SR) and 0.74 NDTW on val-unseen split. Real-world deployment further demonstrates the practicality of our method in physical environment without pre-built map.
ROAug 22, 2024
UMAD: University of Macau Anomaly Detection Benchmark DatasetDong Li, Lineng Chen, Cheng-Zhong Xu et al.
Anomaly detection is critical in surveillance systems and patrol robots by identifying anomalous regions in images for early warning. Depending on whether reference data are utilized, anomaly detection can be categorized into anomaly detection with reference and anomaly detection without reference. Currently, anomaly detection without reference, which is closely related to out-of-distribution (OoD) object detection, struggles with learning anomalous patterns due to the difficulty of collecting sufficiently large and diverse anomaly datasets with the inherent rarity and novelty of anomalies. Alternatively, anomaly detection with reference employs the scheme of change detection to identify anomalies by comparing semantic changes between a reference image and a query one. However, there are very few ADr works due to the scarcity of public datasets in this domain. In this paper, we aim to address this gap by introducing the UMAD Benchmark Dataset. To our best knowledge, this is the first benchmark dataset designed specifically for anomaly detection with reference in robotic patrolling scenarios, e.g., where an autonomous robot is employed to detect anomalous objects by comparing a reference and a query video sequences. The reference sequences can be taken by the robot along a specified route when there are no anomalous objects in the scene. The query sequences are captured online by the robot when it is patrolling in the same scene following the same route. Our benchmark dataset is elaborated such that each query image can find a corresponding reference based on accurate robot localization along the same route in the prebuilt 3D map, with which the reference and query images can be geometrically aligned using adaptive warping. Besides the proposed benchmark dataset, we evaluate the baseline models of ADr on this dataset.
CVMay 7, 2025Code
TS-Diff: Two-Stage Diffusion Model for Low-Light RAW Image EnhancementYi Li, Zhiyuan Zhang, Jiangnan Xia et al.
This paper presents a novel Two-Stage Diffusion Model (TS-Diff) for enhancing extremely low-light RAW images. In the pre-training stage, TS-Diff synthesizes noisy images by constructing multiple virtual cameras based on a noise space. Camera Feature Integration (CFI) modules are then designed to enable the model to learn generalizable features across diverse virtual cameras. During the aligning stage, CFIs are averaged to create a target-specific CFI$^T$, which is fine-tuned using a small amount of real RAW data to adapt to the noise characteristics of specific cameras. A structural reparameterization technique further simplifies CFI$^T$ for efficient deployment. To address color shifts during the diffusion process, a color corrector is introduced to ensure color consistency by dynamically adjusting global color distributions. Additionally, a novel dataset, QID, is constructed, featuring quantifiable illumination levels and a wide dynamic range, providing a comprehensive benchmark for training and evaluation under extreme low-light conditions. Experimental results demonstrate that TS-Diff achieves state-of-the-art performance on multiple datasets, including QID, SID, and ELD, excelling in denoising, generalization, and color consistency across various cameras and illumination levels. These findings highlight the robustness and versatility of TS-Diff, making it a practical solution for low-light imaging applications. Source codes and models are available at https://github.com/CircccleK/TS-Diff
CVMay 7, 2025Code
HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for 3D Human Pose EstimationYajie Fu, Chaorui Huang, Junwei Li et al.
We propose HDiffTG, a novel 3D Human Pose Estimation (3DHPE) method that integrates Transformer, Graph Convolutional Network (GCN), and diffusion model into a unified framework. HDiffTG leverages the strengths of these techniques to significantly improve pose estimation accuracy and robustness while maintaining a lightweight design. The Transformer captures global spatiotemporal dependencies, the GCN models local skeletal structures, and the diffusion model provides step-by-step optimization for fine-tuning, achieving a complementary balance between global and local features. This integration enhances the model's ability to handle pose estimation under occlusions and in complex scenarios. Furthermore, we introduce lightweight optimizations to the integrated model and refine the objective function design to reduce computational overhead without compromising performance. Evaluation results on the Human3.6M and MPI-INF-3DHP datasets demonstrate that HDiffTG achieves state-of-the-art (SOTA) performance on the MPI-INF-3DHP dataset while excelling in both accuracy and computational efficiency. Additionally, the model exhibits exceptional robustness in noisy and occluded environments. Source codes and models are available at https://github.com/CirceJie/HDiffTG
RONov 11, 2021Code
Yaw-Guided Imitation Learning for Autonomous Driving in Urban EnvironmentsYandong Liu, Chengzhong Xu, Hui Kong
Existing imitation learning methods suffer from low efficiency and generalization ability when facing the road option problem in an urban environment. In this paper, we propose a yaw-guided imitation learning method to improve the road option performance in an end-to-end autonomous driving paradigm in terms of the efficiency of exploiting training samples and adaptability to changing environments. Specifically, the yaw information is provided by the trajectory of the navigation map. Our end-to-end architecture, Yaw-guided Imitation Learning with ResNet34 Attention (YILRatt), integrates the ResNet34 backbone and attention mechanism to obtain an accurate perception. It does not need high precision maps and realizes fully end-to-end autonomous driving given the yaw information provided by a consumer-level GPS receiver. By analyzing the attention heat maps, we can reveal some causal relationship between decision-making and scene perception, where, in particular, failure cases are caused by erroneous perception. We collect expert experience in the Carla 0.9.11 simulator and improve the benchmark CoRL2017 and NoCrash. Experimental results show that YILRatt has a 26.27% higher success rate than the SOTA CILRS. The code, dataset, benchmark and experimental results can be found at https://github.com/Yandong024/Yaw-guided-IL.git
CVMar 4, 2025
BHViT: Binarized Hybrid Vision TransformerTian Gao, Zhiyuan Zhang, Yu Zhang et al.
Model binarization has made significant progress in enabling real-time and energy-efficient computation for convolutional neural networks (CNN), offering a potential solution to the deployment challenges faced by Vision Transformers (ViTs) on edge devices. However, due to the structural differences between CNN and Transformer architectures, simply applying binary CNN strategies to the ViT models will lead to a significant performance drop. To tackle this challenge, we propose BHViT, a binarization-friendly hybrid ViT architecture and its full binarization model with the guidance of three important observations. Initially, BHViT utilizes the local information interaction and hierarchical feature aggregation technique from coarse to fine levels to address redundant computations stemming from excessive tokens. Then, a novel module based on shift operations is proposed to enhance the performance of the binary Multilayer Perceptron (MLP) module without significantly increasing computational overhead. In addition, an innovative attention matrix binarization method based on quantization decomposition is proposed to evaluate the token's importance in the binarized attention matrix. Finally, we propose a regularization loss to address the inadequate optimization caused by the incompatibility between the weight oscillation in the binary layers and the Adam Optimizer. Extensive experimental results demonstrate that our proposed algorithm achieves SOTA performance among binary ViT methods.
ROSep 3, 2025
Efficient Active Training for Deep LiDAR OdometryBeibei Zhou, Zhiyuan Zhang, Zhenbo Song et al.
Robust and efficient deep LiDAR odometry models are crucial for accurate localization and 3D reconstruction, but typically require extensive and diverse training data to adapt to diverse environments, leading to inefficiencies. To tackle this, we introduce an active training framework designed to selectively extract training data from diverse environments, thereby reducing the training load and enhancing model generalization. Our framework is based on two key strategies: Initial Training Set Selection (ITSS) and Active Incremental Selection (AIS). ITSS begins by breaking down motion sequences from general weather into nodes and edges for detailed trajectory analysis, prioritizing diverse sequences to form a rich initial training dataset for training the base model. For complex sequences that are difficult to analyze, especially under challenging snowy weather conditions, AIS uses scene reconstruction and prediction inconsistency to iteratively select training samples, refining the model to handle a wide range of real-world scenarios. Experiments across datasets and weather conditions validate our approach's effectiveness. Notably, our method matches the performance of full-dataset training with just 52\% of the sequence volume, demonstrating the training efficiency and robustness of our active training paradigm. By optimizing the training process, our approach sets the stage for more agile and reliable LiDAR odometry systems, capable of navigating diverse environmental conditions with greater precision.
CVJul 4, 2025
Information-Bottleneck Driven Binary Neural Network for Change DetectionKaijie Yin, Zhiyuan Zhang, Shu Kong et al.
In this paper, we propose Binarized Change Detection (BiCD), the first binary neural network (BNN) designed specifically for change detection. Conventional network binarization approaches, which directly quantize both weights and activations in change detection models, severely limit the network's ability to represent input data and distinguish between changed and unchanged regions. This results in significantly lower detection accuracy compared to real-valued networks. To overcome these challenges, BiCD enhances both the representational power and feature separability of BNNs, improving detection performance. Specifically, we introduce an auxiliary objective based on the Information Bottleneck (IB) principle, guiding the encoder to retain essential input information while promoting better feature discrimination. Since directly computing mutual information under the IB principle is intractable, we design a compact, learnable auxiliary module as an approximation target, leading to a simple yet effective optimization strategy that minimizes both reconstruction loss and standard change detection loss. Extensive experiments on street-view and remote sensing datasets demonstrate that BiCD establishes a new benchmark for BNN-based change detection, achieving state-of-the-art performance in this domain.
CVJul 3, 2025
High-Fidelity Differential-information Driven Binary Vision TransformerTian Gao, Zhiyuan Zhang, Kaijie Yin et al.
The binarization of vision transformers (ViTs) offers a promising approach to addressing the trade-off between high computational/storage demands and the constraints of edge-device deployment. However, existing binary ViT methods often suffer from severe performance degradation or rely heavily on full-precision modules. To address these issues, we propose DIDB-ViT, a novel binary ViT that is highly informative while maintaining the original ViT architecture and computational efficiency. Specifically, we design an informative attention module incorporating differential information to mitigate information loss caused by binarization and enhance high-frequency retention. To preserve the fidelity of the similarity calculations between binary Q and K tensors, we apply frequency decomposition using the discrete Haar wavelet and integrate similarities across different frequencies. Additionally, we introduce an improved RPReLU activation function to restructure the activation distribution, expanding the model's representational capacity. Experimental results demonstrate that our DIDB-ViT significantly outperforms state-of-the-art network quantization methods in multiple ViT architectures, achieving superior image classification and segmentation performance.
CVMar 22, 2024
VRSO: Visual-Centric Reconstruction for Static Object AnnotationChenyao Yu, Yingfeng Cai, Jiaxin Zhang et al.
As a part of the perception results of intelligent driving systems, static object detection (SOD) in 3D space provides crucial cues for driving environment understanding. With the rapid deployment of deep neural networks for SOD tasks, the demand for high-quality training samples soars. The traditional, also reliable, way is manual labelling over the dense LiDAR point clouds and reference images. Though most public driving datasets adopt this strategy to provide SOD ground truth (GT), it is still expensive and time-consuming in practice. This paper introduces VRSO, a visual-centric approach for static object annotation. Experiments on the Waymo Open Dataset show that the mean reprojection error from VRSO annotation is only 2.6 pixels, around four times lower than the Waymo Open Dataset labels (10.6 pixels). VRSO is distinguished in low cost, high efficiency, and high quality: (1) It recovers static objects in 3D space with only camera images as input, and (2) manual annotation is barely involved since GT for SOD tasks is generated based on an automatic reconstruction and annotation pipeline.
CVJan 25, 2024
AscDAMs: Advanced SLAM-based channel detection and mapping systemTengfei Wang, Fucheng Lu, Jintao Qin et al.
Obtaining high-resolution, accurate channel topography and deposit conditions is the prior challenge for the study of channelized debris flow. Currently, wide-used mapping technologies including satellite imaging and drone photogrammetry struggle to precisely observe channel interior conditions of mountainous long-deep gullies, particularly those in the Wenchuan Earthquake region. SLAM is an emerging tech for 3D mapping; however, extremely rugged environment in long-deep gullies poses two major challenges even for the state-of-art SLAM: (1) Atypical features; (2) Violent swaying and oscillation of sensors. These issues result in large deviation and lots of noise for SLAM results. To improve SLAM mapping in such environments, we propose an advanced SLAM-based channel detection and mapping system, namely AscDAMs. It features three main enhancements to post-process SLAM results: (1) The digital orthophoto map aided deviation correction algorithm greatly eliminates the systematic error; (2) The point cloud smoothing algorithm substantially diminishes noises; (3) The cross section extraction algorithm enables the quantitative assessment of channel deposits and their changes. Two field experiments were conducted in Chutou Gully, Wenchuan County in China in February and November 2023, representing observations before and after the rainy season. We demonstrate the capability of AscDAMs to greatly improve SLAM results, promoting SLAM for mapping the specially challenging environment. The proposed method compensates for the insufficiencies of existing technologies in detecting debris flow channel interiors including detailed channel morphology, erosion patterns, deposit distinction, volume estimation and change detection. It serves to enhance the study of full-scale debris flow mechanisms, long-term post-seismic evolution, and hazard assessment.
CVMay 23, 2023
Why semantics matters: A deep study on semantic particle-filtering localization in a LiDAR semantic pole-mapYuming Huang, Yi Gu, Chengzhong Xu et al.
In most urban and suburban areas, pole-like structures such as tree trunks or utility poles are ubiquitous. These structural landmarks are very useful for the localization of autonomous vehicles given their geometrical locations in maps and measurements from sensors. In this work, we aim at creating an accurate map for autonomous vehicles or robots with pole-like structures as the dominant localization landmarks, hence called pole-map. In contrast to the previous pole-based mapping or localization methods, we exploit the semantics of pole-like structures. Specifically, semantic segmentation is achieved by a new mask-range transformer network in a mask-classfication paradigm. With the semantics extracted for the pole-like structures in each frame, a multi-layer semantic pole-map is created by aggregating the detected pole-like structures from all frames. Given the semantic pole-map, we propose a semantic particle-filtering localization scheme for vehicle localization. Theoretically, we have analyzed why the semantic information can benefit the particle-filter localization, and empirically it is validated on the public SemanticKITTI dataset that the particle-filtering localization with semantics achieves much better performance than the counterpart without semantics when each particle's odometry prediction and/or the online observation is subject to uncertainties at significant levels.
CVMay 13, 2023
GSB: Group Superposition Binarization for Vision Transformer with Limited Training SamplesTian Gao, Cheng-Zhong Xu, Le Zhang et al.
Vision Transformer (ViT) has performed remarkably in various computer vision tasks. Nonetheless, affected by the massive amount of parameters, ViT usually suffers from serious overfitting problems with a relatively limited number of training samples. In addition, ViT generally demands heavy computing resources, which limit its deployment on resource-constrained devices. As a type of model-compression method, model binarization is potentially a good choice to solve the above problems. Compared with the full-precision one, the model with the binarization method replaces complex tensor multiplication with simple bit-wise binary operations and represents full-precision model parameters and activations with only 1-bit ones, which potentially solves the problem of model size and computational complexity, respectively. In this paper, we investigate a binarized ViT model. Empirically, we observe that the existing binarization technology designed for Convolutional Neural Networks (CNN) cannot migrate well to a ViT's binarization task. We also find that the decline of the accuracy of the binary ViT model is mainly due to the information loss of the Attention module and the Value vector. Therefore, we propose a novel model binarization technique, called Group Superposition Binarization (GSB), to deal with these issues. Furthermore, in order to further improve the performance of the binarization model, we have investigated the gradient calculation procedure in the binarization process and derived more proper gradient calculation equations for GSB to reduce the influence of gradient mismatch. Then, the knowledge distillation technique is introduced to alleviate the performance degradation caused by model binarization. Analytically, model binarization can limit the parameters search space during parameter updates while training a model....
CVDec 1, 2020
Globally Optimal Relative Pose Estimation with Gravity PriorYaqing Ding, Daniel Barath, Jian Yang et al.
Smartphones, tablets and camera systems used, e.g., in cars and UAVs, are typically equipped with IMUs (inertial measurement units) that can measure the gravity vector accurately. Using this additional information, the $y$-axes of the cameras can be aligned, reducing their relative orientation to a single degree-of-freedom. With this assumption, we propose a novel globally optimal solver, minimizing the algebraic error in the least-squares sense, to estimate the relative pose in the over-determined case. Based on the epipolar constraint, we convert the optimization problem into solving two polynomials with only two unknowns. Also, a fast solver is proposed using the first-order approximation of the rotation. The proposed solvers are compared with the state-of-the-art ones on four real-world datasets with approx. 50000 image pairs in total. Moreover, we collected a dataset, by a smartphone, consisting of 10933 image pairs, gravity directions, and ground truth 3D reconstructions.
CVSep 7, 2020
Frontier Detection and Reachability Analysis for Efficient 2D Graph-SLAM Based Active ExplorationZezhou Sun, Banghe Wu, Cheng-Zhong Xu et al.
We propose an integrated approach to active exploration by exploiting the Cartographer method as the base SLAM module for submap creation and performing efficient frontier detection in the geometrically co-aligned submaps induced by graph optimization. We also carry out analysis on the reachability of frontiers and their clusters to ensure that the detected frontier can be reached by robot. Our method is tested on a mobile robot in real indoor scene to demonstrate the effectiveness and efficiency of our approach.
CVJul 30, 2020
Cascaded Non-local Neural Network for Point Cloud Semantic SegmentationMingmei Cheng, Le Hui, Jin Xie et al.
In this paper, we propose a cascaded non-local neural network for point cloud segmentation. The proposed network aims to build the long-range dependencies of point clouds for the accurate segmentation. Specifically, we develop a novel cascaded non-local module, which consists of the neighborhood-level, superpoint-level and global-level non-local blocks. First, in the neighborhood-level block, we extract the local features of the centroid points of point clouds by assigning different weights to the neighboring points. The extracted local features of the centroid points are then used to encode the superpoint-level block with the non-local operation. Finally, the global-level block aggregates the non-local features of the superpoints for semantic segmentation in an encoder-decoder framework. Benefiting from the cascaded structure, geometric structure information of different neighborhoods with the same label can be propagated. In addition, the cascaded structure can largely reduce the computational cost of the original non-local operation on point clouds. Experiments on different indoor and outdoor datasets show that our method achieves state-of-the-art performance and effectively reduces the time consumption and memory occupation.
RODec 9, 2019
LiDAR Iris for Loop-Closure DetectionYing Wang, Zezhou Sun, Cheng-Zhong Xu et al.
In this paper, a global descriptor for a LiDAR point cloud, called LiDAR Iris, is proposed for fast and accurate loop-closure detection. A binary signature image can be obtained for each point cloud after several LoG-Gabor filtering and thresholding operations on the LiDAR-Iris image representation. Given two point clouds, their similarities can be calculated as the Hamming distance of two corresponding binary signature images extracted from the two point clouds, respectively. Our LiDAR-Iris method can achieve a pose-invariant loop-closure detection at a descriptor level with the Fourier transform of the LiDAR-Iris representation if assuming a 3D (x,y,yaw) pose space, although our method can generally be applied to a 6D pose space by re-aligning point clouds with an additional IMU sensor. Experimental results on five road-scene sequences demonstrate its excellent performance in loop-closure detection.
SEMar 27, 2013
Exponential-Condition-Based Barrier Certificate Generation for Safety Verification of Hybrid SystemsHui Kong, Fei He, Xiaoyu Song et al.
A barrier certificate is an inductive invariant function which can be used for the safety verification of a hybrid system. Safety verification based on barrier certificate has the benefit of avoiding explicit computation of the exact reachable set which is usually intractable for nonlinear hybrid systems. In this paper, we propose a new barrier certificate condition, called Exponential Condition, for the safety verification of semi-algebraic hybrid systems. The most important benefit of Exponential Condition is that it has a lower conservativeness than the existing convex condition and meanwhile it possesses the property of convexity. On the one hand, a less conservative barrier certificate forms a tighter over-approximation for the reachable set and hence is able to verify critical safety properties. On the other hand, the property of convexity guarantees its solvability by semidefinite programming method. Some examples are presented to illustrate the effectiveness and practicality of our method.