h-index54
107papers
4,595citations
Novelty48%
AI Score59

107 Papers

LGJul 16, 2022Code
SenseFi: A Library and Benchmark on Deep-Learning-Empowered WiFi Human Sensing

Jianfei Yang, Xinyan Chen, Dazhuo Wang et al. · berkeley

WiFi sensing has been evolving rapidly in recent years. Empowered by propagation models and deep learning methods, many challenging applications are realized such as WiFi-based human activity recognition and gesture recognition. However, in contrast to deep learning for visual recognition and natural language processing, no sufficiently comprehensive public benchmark exists. In this paper, we review the recent progress on deep learning enabled WiFi sensing, and then propose a benchmark, SenseFi, to study the effectiveness of various deep learning models for WiFi sensing. These advanced models are compared in terms of distinct sensing tasks, WiFi platforms, recognition accuracy, model size, computational complexity, feature transferability, and adaptability of unsupervised learning. It is also regarded as a tutorial for deep learning based WiFi sensing, starting from CSI hardware platform to sensing algorithms. The extensive experiments provide us with experiences in deep model design, learning strategy skills and training techniques for real-world applications. To the best of our knowledge, this is the first benchmark with an open-source library for deep learning in WiFi sensing research. The benchmark codes are available at https://github.com/xyanchen/WiFi-CSI-Sensing-Benchmark.

NIApr 8, 2022
EfficientFi: Towards Large-Scale Lightweight WiFi Sensing via CSI Compression

Jianfei Yang, Xinyan Chen, Han Zou et al. · berkeley

WiFi technology has been applied to various places due to the increasing requirement of high-speed Internet access. Recently, besides network services, WiFi sensing is appealing in smart homes since it is device-free, cost-effective and privacy-preserving. Though numerous WiFi sensing methods have been developed, most of them only consider single smart home scenario. Without the connection of powerful cloud server and massive users, large-scale WiFi sensing is still difficult. In this paper, we firstly analyze and summarize these obstacles, and propose an efficient large-scale WiFi sensing framework, namely EfficientFi. The EfficientFi works with edge computing at WiFi APs and cloud computing at center servers. It consists of a novel deep neural network that can compress fine-grained WiFi Channel State Information (CSI) at edge, restore CSI at cloud, and perform sensing tasks simultaneously. A quantized auto-encoder and a joint classifier are designed to achieve these goals in an end-to-end fashion. To the best of our knowledge, the EfficientFi is the first IoT-cloud-enabled WiFi sensing framework that significantly reduces communication overhead while realizing sensing tasks accurately. We utilized human activity recognition and identification via WiFi sensing as two case studies, and conduct extensive experiments to evaluate the EfficientFi. The results show that it compresses CSI data from 1.368Mb/s to 0.768Kb/s with extremely low error of data reconstruction and achieves over 98% accuracy for human activity recognition.

LGSep 11, 2023Code
Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data

Yucheng Wang, Yuecong Xu, Jianfei Yang et al.

Multivariate Time-Series (MTS) data is crucial in various application fields. With its sequential and multi-source (multiple sensors) properties, MTS data inherently exhibits Spatial-Temporal (ST) dependencies, involving temporal correlations between timestamps and spatial correlations between sensors in each timestamp. To effectively leverage this information, Graph Neural Network-based methods (GNNs) have been widely adopted. However, existing approaches separately capture spatial dependency and temporal dependency and fail to capture the correlations between Different sEnsors at Different Timestamps (DEDT). Overlooking such correlations hinders the comprehensive modelling of ST dependencies within MTS data, thus restricting existing GNNs from learning effective representations. To address this limitation, we propose a novel method called Fully-Connected Spatial-Temporal Graph Neural Network (FC-STGNN), including two key components namely FC graph construction and FC graph convolution. For graph construction, we design a decay graph to connect sensors across all timestamps based on their temporal distances, enabling us to fully model the ST dependencies by considering the correlations between DEDT. Further, we devise FC graph convolution with a moving-pooling GNN layer to effectively capture the ST dependencies for learning effective representations. Extensive experiments show the effectiveness of FC-STGNN on multiple MTS datasets compared to SOTA methods. The code is available at https://github.com/Frank-Wang-oss/FCSTGNN.

CVAug 30, 2022
GaitFi: Robust Device-Free Human Identification via WiFi and Vision Multimodal Learning

Lang Deng, Jianfei Yang, Shenghai Yuan et al. · berkeley

As an important biomarker for human identification, human gait can be collected at a distance by passive sensors without subject cooperation, which plays an essential role in crime prevention, security detection and other human identification applications. At present, most research works are based on cameras and computer vision techniques to perform gait recognition. However, vision-based methods are not reliable when confronting poor illuminations, leading to degrading performances. In this paper, we propose a novel multimodal gait recognition method, namely GaitFi, which leverages WiFi signals and videos for human identification. In GaitFi, Channel State Information (CSI) that reflects the multi-path propagation of WiFi is collected to capture human gaits, while videos are captured by cameras. To learn robust gait information, we propose a Lightweight Residual Convolution Network (LRCN) as the backbone network, and further propose the two-stream GaitFi by integrating WiFi and vision features for the gait retrieval task. The GaitFi is trained by the triplet loss and classification loss on different levels of features. Extensive experiments are conducted in the real world, which demonstrates that the GaitFi outperforms state-of-the-art gait recognition methods based on single WiFi or camera, achieving 94.2% for human identification tasks of 12 subjects.

SYSep 22, 2011
Distributed Consensus of Linear Multi-Agent Systems with Adaptive Dynamic Protocols

Zhongkui Li, Xiangdong Liu, Wei Ren et al.

This paper considers the distributed consensus problem of multi-agent systems with general continuous-time linear dynamics. Two distributed adaptive dynamic consensus protocols are proposed, based on the relative output information of neighboring agents. One protocol assigns an adaptive coupling weight to each edge in the communication graph while the other uses an adaptive coupling weight for each node. These two adaptive protocols are designed to ensure that consensus is reached in a fully distributed fashion for any undirected connected communication graphs without using any global information. A sufficient condition for the existence of these adaptive protocols is that each agent is stabilizable and detectable. The cases with leader-follower and switching communication graphs are also studied.

NIApr 12, 2022
AutoFi: Towards Automatic WiFi Human Sensing via Geometric Self-Supervised Learning

Jianfei Yang, Xinyan Chen, Han Zou et al. · berkeley

WiFi sensing technology has shown superiority in smart homes among various sensors for its cost-effective and privacy-preserving merits. It is empowered by Channel State Information (CSI) extracted from WiFi signals and advanced machine learning models to analyze motion patterns in CSI. Many learning-based models have been proposed for kinds of applications, but they severely suffer from environmental dependency. Though domain adaptation methods have been proposed to tackle this issue, it is not practical to collect high-quality, well-segmented and balanced CSI samples in a new environment for adaptation algorithms, but randomly-captured CSI samples can be easily collected. {\color{black}In this paper, we firstly explore how to learn a robust model from these low-quality CSI samples, and propose AutoFi, an annotation-efficient WiFi sensing model based on a novel geometric self-supervised learning algorithm.} The AutoFi fully utilizes unlabeled low-quality CSI samples that are captured randomly, and then transfers the knowledge to specific tasks defined by users, which is the first work to achieve cross-task transfer in WiFi sensing. The AutoFi is implemented on a pair of Atheros WiFi APs for evaluation. The AutoFi transfers knowledge from randomly collected CSI samples into human gait recognition and achieves state-of-the-art performance. Furthermore, we simulate cross-task transfer using public datasets to further demonstrate its capacity for cross-task learning. For the UT-HAR and Widar datasets, the AutoFi achieves satisfactory results on activity recognition and gesture recognition without any prior training. We believe that the AutoFi takes a huge step toward automatic WiFi sensing without any developer engagement.

LGSep 11, 2023Code
Graph-Aware Contrasting for Multivariate Time-Series Classification

Yucheng Wang, Yuecong Xu, Jianfei Yang et al.

Contrastive learning, as a self-supervised learning paradigm, becomes popular for Multivariate Time-Series (MTS) classification. It ensures the consistency across different views of unlabeled samples and then learns effective representations for these samples. Existing contrastive learning methods mainly focus on achieving temporal consistency with temporal augmentation and contrasting techniques, aiming to preserve temporal patterns against perturbations for MTS data. However, they overlook spatial consistency that requires the stability of individual sensors and their correlations. As MTS data typically originate from multiple sensors, ensuring spatial consistency becomes essential for the overall performance of contrastive learning on MTS data. Thus, we propose Graph-Aware Contrasting for spatial consistency across MTS data. Specifically, we propose graph augmentations including node and edge augmentations to preserve the stability of sensors and their correlations, followed by graph contrasting with both node- and graph-level contrasting to extract robust sensor- and global-level features. We further introduce multi-window temporal contrasting to ensure temporal consistency in the data for each sensor. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance on various MTS classification tasks. The code is available at https://github.com/Frank-Wang-oss/TS-GAC.

CVAug 22, 2022
MetaFi: Device-Free Pose Estimation via Commodity WiFi for Metaverse Avatar Simulation

Jianfei Yang, Yunjiao Zhou, He Huang et al. · berkeley

Avatar refers to a representative of a physical user in the virtual world that can engage in different activities and interact with other objects in metaverse. Simulating the avatar requires accurate human pose estimation. Though camera-based solutions yield remarkable performance, they encounter the privacy issue and degraded performance caused by varying illumination, especially in smart home. In this paper, we propose a WiFi-based IoT-enabled human pose estimation scheme for metaverse avatar simulation, namely MetaFi. Specifically, a deep neural network is designed with customized convolutional layers and residual blocks to map the channel state information to human pose landmarks. It is enforced to learn the annotations from the accurate computer vision model, thus achieving cross-modal supervision. WiFi is ubiquitous and robust to illumination, making it a feasible solution for avatar applications in smart home. The experiments are conducted in the real world, and the results show that the MetaFi achieves very high performance with a PCK@50 of 95.23%.

CRApr 4, 2022
SecureSense: Defending Adversarial Attack for Secure Device-Free Human Activity Recognition

Jianfei Yang, Han Zou, Lihua Xie · berkeley

Deep neural networks have empowered accurate device-free human activity recognition, which has wide applications. Deep models can extract robust features from various sensors and generalize well even in challenging situations such as data-insufficient cases. However, these systems could be vulnerable to input perturbations, i.e. adversarial attacks. We empirically demonstrate that both black-box Gaussian attacks and modern adversarial white-box attacks can render their accuracies to plummet. In this paper, we firstly point out that such phenomenon can bring severe safety hazards to device-free sensing systems, and then propose a novel learning framework, SecureSense, to defend common attacks. SecureSense aims to achieve consistent predictions regardless of whether there exists an attack on its input or not, alleviating the negative effect of distribution perturbation caused by adversarial attacks. Extensive experiments demonstrate that our proposed method can significantly enhance the model robustness of existing deep models, overcoming possible attacks. The results validate that our method works well on wireless human activity recognition and person identification systems. To the best of our knowledge, this is the first work to investigate adversarial attacks and further develop a novel defense framework for wireless human activity recognition in mobile computing research.

CVNov 17, 2022Code
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey

Yuecong Xu, Haozhi Cao, Zhenghua Chen et al.

Video analysis tasks such as action recognition have received increasing research interest with growing applications in fields such as smart healthcare, thanks to the introduction of large-scale datasets and deep learning-based representations. However, video models trained on existing datasets suffer from significant performance degradation when deployed directly to real-world applications due to domain shifts between the training public video datasets (source video domains) and real-world videos (target video domains). Further, with the high cost of video annotation, it is more practical to use unlabeled videos for training. To tackle performance degradation and address concerns in high video annotation cost uniformly, the video unsupervised domain adaptation (VUDA) is introduced to adapt video models from the labeled source domain to the unlabeled target domain by alleviating video domain shift, improving the generalizability and portability of video models. This paper surveys recent progress in VUDA with deep learning. We begin with the motivation of VUDA, followed by its definition, and recent progress of methods for both closed-set VUDA and VUDA under different scenarios, and current benchmark datasets for VUDA research. Eventually, future directions are provided to promote further VUDA research. The repository of this survey is provided at https://github.com/xuyu0010/awesome-video-domain-adaptation.

CVNov 14, 2023
TENT: Connect Language Models with IoT Sensors for Zero-Shot Activity Recognition

Yunjiao Zhou, Jianfei Yang, Han Zou et al. · berkeley

Recent achievements in language models have showcased their extraordinary capabilities in bridging visual information with semantic language understanding. This leads us to a novel question: can language models connect textual semantics with IoT sensory signals to perform recognition tasks, e.g., Human Activity Recognition (HAR)? If so, an intelligent HAR system with human-like cognition can be built, capable of adapting to new environments and unseen categories. This paper explores its feasibility with an innovative approach, IoT-sEnsors-language alignmEnt pre-Training (TENT), which jointly aligns textual embeddings with IoT sensor signals, including camera video, LiDAR, and mmWave. Through the IoT-language contrastive learning, we derive a unified semantic feature space that aligns multi-modal features with language embeddings, so that the IoT data corresponds to specific words that describe the IoT data. To enhance the connection between textual categories and their IoT data, we propose supplementary descriptions and learnable prompts that bring more semantic information into the joint feature space. TENT can not only recognize actions that have been seen but also ``guess'' the unseen action by the closest textual words from the feature space. We demonstrate TENT achieves state-of-the-art performance on zero-shot HAR tasks using different modalities, improving the best vision-language models by over 12%.

CVSep 21, 2023Code
MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic Segmentation

Haozhi Cao, Yuecong Xu, Jianfei Yang et al.

Multi-modal unsupervised domain adaptation (MM-UDA) for 3D semantic segmentation is a practical solution to embed semantic understanding in autonomous systems without expensive point-wise annotations. While previous MM-UDA methods can achieve overall improvement, they suffer from significant class-imbalanced performance, restricting their adoption in real applications. This imbalanced performance is mainly caused by: 1) self-training with imbalanced data and 2) the lack of pixel-wise 2D supervision signals. In this work, we propose Multi-modal Prior Aided (MoPA) domain adaptation to improve the performance of rare objects. Specifically, we develop Valid Ground-based Insertion (VGI) to rectify the imbalance supervision signals by inserting prior rare objects collected from the wild while avoiding introducing artificial artifacts that lead to trivial solutions. Meanwhile, our SAM consistency loss leverages the 2D prior semantic masks from SAM as pixel-wise supervision signals to encourage consistent predictions for each object in the semantic mask. The knowledge learned from modal-specific prior is then shared across modalities to achieve better rare object segmentation. Extensive experiments show that our method achieves state-of-the-art performance on the challenging MM-UDA benchmark. Code will be available at https://github.com/AronCao49/MoPA.

SYSep 17, 2011
Distributed Robust Control of Linear Multi-Agent Systems with Parameter Uncertainties

Zhongkui Li, Zhisheng Duan, Lihua Xie et al.

This paper considers the distributed robust control problems of uncertain linear multi-agent systems with undirected communication topologies. It is assumed that the agents have identical nominal dynamics while subject to different norm-bounded parameter uncertainties, leading to weakly heterogeneous multi-agent systems. Distributed controllers are designed for both continuous- and discrete-time multi-agent systems, based on the relative states of neighboring agents and a subset of absolute states of the agents. It is shown for both the continuous- and discrete-time cases that the distributed robust control problems under such controllers in the sense of quadratic stability are equivalent to the $H_\infty$ control problems of a set of decoupled linear systems having the same dimensions as a single agent. A two-step algorithm is presented to construct the distributed controller for the continuous-time case, which does not involve any conservatism and meanwhile decouples the feedback gain design from the communication topology. Furthermore, a sufficient existence condition in terms of linear matrix inequalities is derived for the distributed discrete-time controller. Finally, the distributed robust $H_\infty$ control problems of uncertain linear multi-agent systems subject to external disturbances are discussed.

LGSep 29, 2024Code
A Survey on Graph Neural Networks for Remaining Useful Life Prediction: Methodologies, Evaluation and Future Trends

Yucheng Wang, Min Wu, Xiaoli Li et al.

Remaining Useful Life (RUL) prediction is a critical aspect of Prognostics and Health Management (PHM), aimed at predicting the future state of a system to enable timely maintenance and prevent unexpected failures. While existing deep learning methods have shown promise, they often struggle to fully leverage the spatial information inherent in complex systems, limiting their effectiveness in RUL prediction. To address this challenge, recent research has explored the use of Graph Neural Networks (GNNs) to model spatial information for more accurate RUL prediction. This paper presents a comprehensive review of GNN techniques applied to RUL prediction, summarizing existing methods and offering guidance for future research. We first propose a novel taxonomy based on the stages of adapting GNNs to RUL prediction, systematically categorizing approaches into four key stages: graph construction, graph modeling, graph information processing, and graph readout. By organizing the field in this way, we highlight the unique challenges and considerations at each stage of the GNN pipeline. Additionally, we conduct a thorough evaluation of various state-of-the-art (SOTA) GNN methods, ensuring consistent experimental settings for fair comparisons. This rigorous analysis yields valuable insights into the strengths and weaknesses of different approaches, serving as an experimental guide for researchers and practitioners working in this area. Finally, we identify and discuss several promising research directions that could further advance the field, emphasizing the potential for GNNs to revolutionize RUL prediction and enhance the effectiveness of PHM strategies. The benchmarking codes are available in GitHub: https://github.com/Frank-Wang-oss/GNN\_RUL\_Benchmarking.

SYFeb 24, 2012
Global $H_\infty$ Consensus of Multi-Agent Systems with Lipschitz Nonlinear Dynamics

Zhongkui Li, Xiangdong Liu, Mengyin Fu et al.

This paper addresses the global consensus problems of a class of nonlinear multi-agent systems with Lipschitz nonlinearity and directed communication graphs, by using a distributed consensus protocol based on the relative states of neighboring agents. A two-step algorithm is presented to construct a protocol, under which a Lipschitz multi-agent system without disturbances can reach global consensus for a strongly connected directed communication graph. Another algorithm is then given to design a protocol which can achieve global consensus with a guaranteed $H_\infty$ performance for a Lipschitz multiagent system subject to external disturbances. The case with a leader-follower communication graph is also discussed. Finally, the effectiveness of the theoretical results is demonstrated through a network of single-link manipulators.

ROSep 17, 2024Code
ULOC: Learning to Localize in Complex Large-Scale Environments with Ultra-Wideband Ranges

Thien-Minh Nguyen, Yizhuo Yang, Tien-Dat Nguyen et al.

While UWB-based methods can achieve high localization accuracy in small-scale areas, their accuracy and reliability are significantly challenged in large-scale environments. In this paper, we propose a learning-based framework named ULOC for Ultra-Wideband (UWB) based localization in such complex large-scale environments. First, anchors are deployed in the environment without knowledge of their actual position. Then, UWB observations are collected when the vehicle travels in the environment. At the same time, map-consistent pose estimates are developed from registering (onboard self-localization) data with the prior map to provide the training labels. We then propose a network based on MAMBA that learns the ranging patterns of UWBs over a complex large-scale environment. The experiment demonstrates that our solution can ensure high localization accuracy on a large scale compared to the state-of-the-art. We release our source code to benefit the community at https://github.com/brytsknguyen/uloc.

SYMay 8, 2020
Bayesian Filtering with Unknown Sensor Measurement Losses

Jiaqi Zhang, Keyou You, Lihua Xie

This work studies the state estimation problem of a stochastic nonlinear system with unknown sensor measurement losses. If the estimator knows the sensor measurement losses of a linear Gaussian system, the minimum variance estimate is easily computed by the celebrated intermittent Kalman filter (IKF). However, this will no longer be the case when the measurement losses are unknown and/or the system is nonlinear or non-Gaussian. By exploiting the binary property of the measurement loss process and the IKF, we design three suboptimal filters for the state estimation, i.e., BKF-I, BKF-II and RBPF. The BKF-I is based on the MAP estimator of the measurement loss process and the BKF-II is derived by estimating the conditional loss probability. The RBPF is a particle filter based algorithm which marginalizes out the loss process to increase the efficiency of particles. All the proposed filters can be easily implemented in recursive forms. Finally, a linear system, a target tracking system and a quadrotor's path control problem are included to illustrate their effectiveness, and show the tradeoff between computational complexity and estimation accuracy of the proposed filters.

SYJul 5, 2021
Computing Probabilistic Controlled Invariant Sets

Yulong Gao, Karl H. Johansson, Lihua Xie

This paper investigates stochastic invariance for control systems through probabilistic controlled invariant sets (PCISs). As a natural complement to robust controlled invariant sets~(RCISs), we propose finite- and infinite-horizon PCISs, and explore their relation to RICSs. We design iterative algorithms to compute the PCIS within a given set. For systems with discrete spaces, the computations of the finite- and infinite-horizon PCISs at each iteration are based on linear programming and mixed integer linear programming, respectively. The algorithms are computationally tractable and terminate in a finite number of steps. For systems with continuous spaces, we show how to discretize the spaces and prove the convergence of the approximation when computing the finite-horizon PCISs. In addition, it is shown that an infinite-horizon PCIS can be computed by the stochastic backward reachable set from the RCIS contained in it. These PCIS algorithms are applicable to practical control systems. Simulations are given to illustrate the effectiveness of the theoretical results for motion planning.

SYJul 5, 2018
An integrated localization-navigation scheme for distance-based docking of UAVs

Thien-Minh Nguyen, Zhirong Qiu, Muqing Cao et al.

In this paper we study the distance-based docking problem of unmanned aerial vehicles (UAVs) by using a single landmark placed at an arbitrarily unknown position. To solve the problem, we propose an integrated estimation-control scheme to simultaneously achieve the relative localization and navigation tasks for discrete-time integrators under bounded velocity: a nonlinear adaptive estimation scheme to estimate the relative position to the landmark, and a delicate control scheme to ensure both the convergence of the estimation and the asymptotic docking at the given landmark. A rigorous proof of convergence is provided by invoking the discrete-time LaSalle's invariance principle, and we also validate our theoretical findings on quadcopters equipped with ultra-wideband ranging sensors and optical flow sensors in a GPS-less environment.

SYDec 28, 2016
On Stochastic Sensor Network Scheduling for Multiple Processes

Duo Han, Junfeng Wu, Yilin Mo et al.

We consider the problem of multiple sensor scheduling for remote state estimation of multiple process over a shared link. In this problem, a set of sensors monitor mutually independent dynamical systems in parallel but only one sensor can access the shared channel at each time to transmit the data packet to the estimator. We propose a stochastic event-based sensor scheduling in which each sensor makes transmission decisions based on both channel accessibility and distributed event-triggering conditions. The corresponding minimum mean squared error (MMSE) estimator is explicitly given. Considering information patterns accessed by sensor schedulers, time-based ones can be treated as a special case of the proposed one. By ultilizing realtime information, the proposed schedule outperforms the time-based ones in terms of the estimation quality. Resorting to solving an Markov decision process (MDP) problem with average cost criterion, we can find optimal parameters for the proposed schedule. As for practical use, a greedy algorithm is devised for parameter design, which has rather low computational complexity. We also provide a method to quantify the performance gap between the schedule optimized via MDP and any other schedules.

SYSep 29, 2016
Data Rate for Distributed Consensus of Multi-agent Systems with High Order Oscillator Dynamics

Zhirong Qiu, Lihua Xie, Yiguang Hong

Distributed consensus with data rate constraint is an important research topic of multi-agent systems. Some results have been obtained for consensus of multi-agent systems with integrator dynamics, but it remains challenging for general high-order systems, especially in the presence of unmeasurable states. In this paper, we study the quantized consensus problem for a special kind of high-order systems and investigate the corresponding data rate required for achieving consensus. The state matrix of each agent is a 2m-th order real Jordan block admitting m identical pairs of conjugate poles on the unit circle; each agent has a single input, and only the first state variable can be measured. The case of harmonic oscillators corresponding to m=1 is first investigated under a directed communication topology which contains a spanning tree, while the general case of m >= 2 is considered for a connected and undirected network. In both cases it is concluded that the sufficient number of communication bits to guarantee the consensus at an exponential convergence rate is an integer between $m$ and $2m$, depending on the location of the poles.

CVApr 28, 2022
Computer Vision for Road Imaging and Pothole Detection: A State-of-the-Art Review of Systems and Algorithms

Nachuan Ma, Jiahe Fan, Wenshuo Wang et al.

Computer vision algorithms have been prevalently utilized for 3-D road imaging and pothole detection for over two decades. Nonetheless, there is a lack of systematic survey articles on state-of-the-art (SoTA) computer vision techniques, especially deep learning models, developed to tackle these problems. This article first introduces the sensing systems employed for 2-D and 3-D road data acquisition, including camera(s), laser scanners, and Microsoft Kinect. Afterward, it thoroughly and comprehensively reviews the SoTA computer vision algorithms, including (1) classical 2-D image processing, (2) 3-D point cloud modeling and segmentation, and (3) machine/deep learning, developed for road pothole detection. This article also discusses the existing challenges and future development trends of computer vision-based road pothole detection approaches: classical 2-D image processing-based and 3-D point cloud modeling and segmentation-based approaches have already become history; and Convolutional neural networks (CNNs) have demonstrated compelling road pothole detection results and are promising to break the bottleneck with the future advances in self/un-supervised learning for multi-modal semantic segmentation. We believe that this survey can serve as practical guidance for developing the next-generation road condition assessment systems.

CVSep 21, 2022
AirFi: Empowering WiFi-based Passive Human Gesture Recognition to Unseen Environment via Domain Generalization

Dazhuo Wang, Jianfei Yang, Wei Cui et al.

WiFi-based smart human sensing technology enabled by Channel State Information (CSI) has received great attention in recent years. However, CSI-based sensing systems suffer from performance degradation when deployed in different environments. Existing works solve this problem by domain adaptation using massive unlabeled high-quality data from the new environment, which is usually unavailable in practice. In this paper, we propose a novel augmented environment-invariant robust WiFi gesture recognition system named AirFi that deals with the issue of environment dependency from a new perspective. The AirFi is a novel domain generalization framework that learns the critical part of CSI regardless of different environments and generalizes the model to unseen scenarios, which does not require collecting any data for adaptation to the new environment. AirFi extracts the common features from several training environment settings and minimizes the distribution differences among them. The feature is further augmented to be more robust to environments. Moreover, the system can be further improved by few-shot learning techniques. Compared to state-of-the-art methods, AirFi is able to work in different environment settings without acquiring any CSI data from the new environment. The experimental results demonstrate that our system remains robust in the new environment and outperforms the compared systems.

LGMay 28, 2022
Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors

Jianfei Yang, Xiangyu Peng, Kai Wang et al.

Domain Adaptation of Black-box Predictors (DABP) aims to learn a model on an unlabeled target domain supervised by a black-box predictor trained on a source domain. It does not require access to both the source-domain data and the predictor parameters, thus addressing the data privacy and portability issues of standard domain adaptation. Existing DABP approaches mostly rely on model distillation from the black-box predictor, \emph{i.e.}, training the model with its noisy target-domain predictions, which however inevitably introduces the confirmation bias accumulated from the prediction noises. To mitigate such bias, we propose a new method, named BETA, to incorporate knowledge distillation and noisy label learning into one coherent framework. This is enabled by a new divide-to-adapt strategy. BETA divides the target domain into an easy-to-adapt subdomain with less noise and a hard-to-adapt subdomain. Then it deploys mutually-teaching twin networks to filter the predictor errors for each other and improve them progressively, from the easy to hard subdomains. As such, BETA effectively purifies the noisy labels and reduces error accumulation. We theoretically show that the target error of BETA is minimized by decreasing the noise ratio of the subdomains. Extensive experiments demonstrate BETA outperforms existing methods on all DABP benchmarks, and is even comparable with the standard domain adaptation methods that use the source-domain data.

CVMar 18, 2023
Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation

Haozhi Cao, Yuecong Xu, Jianfei Yang et al.

Continual Test-Time Adaptation (CTTA) generalizes conventional Test-Time Adaptation (TTA) by assuming that the target domain is dynamic over time rather than stationary. In this paper, we explore Multi-Modal Continual Test-Time Adaptation (MM-CTTA) as a new extension of CTTA for 3D semantic segmentation. The key to MM-CTTA is to adaptively attend to the reliable modality while avoiding catastrophic forgetting during continual domain shifts, which is out of the capability of previous TTA or CTTA methods. To fulfill this gap, we propose an MM-CTTA method called Continual Cross-Modal Adaptive Clustering (CoMAC) that addresses this task from two perspectives. On one hand, we propose an adaptive dual-stage mechanism to generate reliable cross-modal predictions by attending to the reliable modality based on the class-wise feature-centroid distance in the latent space. On the other hand, to perform test-time adaptation without catastrophic forgetting, we design class-wise momentum queues that capture confident target features for adaptation while stochastically restoring pseudo-source features to revisit source knowledge. We further introduce two new benchmarks to facilitate the exploration of MM-CTTA in the future. Our experimental results show that our method achieves state-of-the-art performance on both benchmarks.

SYFeb 10, 2018
Iterative Learning Economic Model Predictive Control

Yushen Long, Lihua Xie, Shuai Liu

An iterative learning based economic model predictive controller (ILEMPC) is proposed for repetitive tasks in this paper. Compared with existing works, the initial feasible trajectory of the proposed ILEMPC is not restricted to be convergent to an equilibrium so it can handle various types of control objectives: stabilization, tracking a periodic trajectory and even pure economic optimization. The controller can learn from the previous closed-loop trajectory, resulting in a performance which is guaranteed to be no worse than the previous one. Under some standard assumptions in model predictive control, we show that recursive feasibility is ensured. Furthermore, for stabilization problem, the convergence of each learned trajectory and the learning process are established provided the initial trajectory is convergent. Numerical examples show that the proposed control strategy works well for different types of control tasks and systems.

SYMay 31, 2018
Least-square based recursive optimization for distance-based source localization

Thien-Minh Nguyen, Lihua Xie

In this paper we study the problem of driving an agent to an unknown source whose location is estimated in real-time by a recursive optimization algorithm. The optimization criterion is subject to a least-square cost function constructed from the distance measurements to the target combined with the agent's self-odometry. In this work, two important issues concerning real world application are directly addressed, which is a discrete-time recursive algorithm for concurrent control and estimation, and consideration for input saturation. It is proven that with proper choices of the system's parameters, stability of all system states, including on-board estimator variables and the agent-target relative position can be achieved. The convergence of the agent's position to the target is also investigated via numerical simulation.

SYNov 26, 2015
Nash Equilibrium Computation in Subnetwork Zero-Sum Games with Switching Communications

Youcheng Lou, Yiguang Hong, Lihua Xie et al.

In this paper, we investigate a distributed Nash equilibrium computation problem for a time-varying multi-agent network consisting of two subnetworks, where the two subnetworks share the same objective function. We first propose a subgradient-based distributed algorithm with heterogeneous stepsizes to compute a Nash equilibrium of a zero-sum game. We then prove that the proposed algorithm can achieve a Nash equilibrium under uniformly jointly strongly connected (UJSC) weight-balanced digraphs with homogenous stepsizes. Moreover, we demonstrate that for weighted-unbalanced graphs a Nash equilibrium may not be achieved with homogenous stepsizes unless certain conditions on the objective function hold. We show that there always exist heterogeneous stepsizes for the proposed algorithm to guarantee that a Nash equilibrium can be achieved for UJSC digraphs. Finally, in two standard weight-unbalanced cases, we verify the convergence to a Nash equilibrium by adaptively updating the stepsizes along with the arc weights in the proposed algorithm.

CVAug 10, 2022
Leveraging Endo- and Exo-Temporal Regularization for Black-box Video Domain Adaptation

Yuecong Xu, Jianfei Yang, Haozhi Cao et al.

To enable video models to be applied seamlessly across video tasks in different environments, various Video Unsupervised Domain Adaptation (VUDA) methods have been proposed to improve the robustness and transferability of video models. Despite improvements made in model robustness, these VUDA methods require access to both source data and source model parameters for adaptation, raising serious data privacy and model portability issues. To cope with the above concerns, this paper firstly formulates Black-box Video Domain Adaptation (BVDA) as a more realistic yet challenging scenario where the source video model is provided only as a black-box predictor. While a few methods for Black-box Domain Adaptation (BDA) are proposed in image domain, these methods cannot apply to video domain since video modality has more complicated temporal features that are harder to align. To address BVDA, we propose a novel Endo and eXo-TEmporal Regularized Network (EXTERN) by applying mask-to-mix strategies and video-tailored regularizations: endo-temporal regularization and exo-temporal regularization, performed across both clip and temporal features, while distilling knowledge from the predictions obtained from the black-box predictor. Empirical results demonstrate the state-of-the-art performance of EXTERN across various cross-domain closed-set and partial-set action recognition benchmarks, which even surpassed most existing video domain adaptation methods with source data accessibility.

LGMay 1, 2022
A Survey on Distributed Online Optimization and Game

Xiuxian Li, Lihua Xie, Na Li

Distributed online optimization and game have been increasingly researched in the last decade, mostly motivated by its wide applications in sensor networks, robotics (e.g., distributed target tracking and formation control), smart grids, deep learning, and so forth. In these problems, there is a network of agents who may be cooperative (i.e., distributed online optimization) or noncooperative (i.e., online game) through local information exchanges. And the local cost function of each agent is often time-varying in dynamic and even adversarial environments. At each time, a decision must be made by each agent based on historical information at hand without knowing future information on cost functions. For these problems, a comprehensive survey is still lacking. This paper aims to provide a thorough overview of distributed online optimization and game from the perspective of problem settings, communication, computation, algorithms, and performances. In addition, some potential future directions are also discussed.

CVSep 29, 2023
AdaPose: Towards Cross-Site Device-Free Human Pose Estimation with Commodity WiFi

Yunjiao Zhou, Jianfei Yang, He Huang et al.

WiFi-based pose estimation is a technology with great potential for the development of smart homes and metaverse avatar generation. However, current WiFi-based pose estimation methods are predominantly evaluated under controlled laboratory conditions with sophisticated vision models to acquire accurately labeled data. Furthermore, WiFi CSI is highly sensitive to environmental variables, and direct application of a pre-trained model to a new environment may yield suboptimal results due to domain shift. In this paper, we proposes a domain adaptation algorithm, AdaPose, designed specifically for weakly-supervised WiFi-based pose estimation. The proposed method aims to identify consistent human poses that are highly resistant to environmental dynamics. To achieve this goal, we introduce a Mapping Consistency Loss that aligns the domain discrepancy of source and target domains based on inner consistency between input and output at the mapping level. We conduct extensive experiments on domain adaptation in two different scenes using our self-collected pose estimation dataset containing WiFi CSI frames. The results demonstrate the effectiveness and robustness of AdaPose in eliminating domain shift, thereby facilitating the widespread application of WiFi-based pose estimation in smart cities.

RODec 3, 2025Code
What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models

Tianchen Deng, Yue Pan, Shenghai Yuan et al.

In this paper, we provide a comprehensive overview of existing scene representation methods for robotics, covering traditional representations such as point clouds, voxels, signed distance functions (SDF), and scene graphs, as well as more recent neural representations like Neural Radiance Fields (NeRF), 3D Gaussian Splatting (3DGS), and the emerging Foundation Models. While current SLAM and localization systems predominantly rely on sparse representations like point clouds and voxels, dense scene representations are expected to play a critical role in downstream tasks such as navigation and obstacle avoidance. Moreover, neural representations such as NeRF, 3DGS, and foundation models are well-suited for integrating high-level semantic features and language-based priors, enabling more comprehensive 3D scene understanding and embodied intelligence. In this paper, we categorized the core modules of robotics into five parts (Perception, Mapping, Localization, Navigation, Manipulation). We start by presenting the standard formulation of different scene representation methods and comparing the advantages and disadvantages of scene representation across different modules. This survey is centered around the question: What is the best 3D scene representation for robotics? We then discuss the future development trends of 3D scene representations, with a particular focus on how the 3D Foundation Model could replace current methods as the unified solution for future robotic applications. The remaining challenges in fully realizing this model are also explored. We aim to offer a valuable resource for both newcomers and experienced researchers to explore the future of 3D scene representations and their application in robotics. We have published an open-source project on GitHub and will continue to add new works and technologies to this project.

LGNov 17, 2023
SEA++: Multi-Graph-based High-Order Sensor Alignment for Multivariate Time-Series Unsupervised Domain Adaptation

Yucheng Wang, Yuecong Xu, Jianfei Yang et al.

Unsupervised Domain Adaptation (UDA) methods have been successful in reducing label dependency by minimizing the domain discrepancy between a labeled source domain and an unlabeled target domain. However, these methods face challenges when dealing with Multivariate Time-Series (MTS) data. MTS data typically consist of multiple sensors, each with its own unique distribution. This characteristic makes it hard to adapt existing UDA methods, which mainly focus on aligning global features while overlooking the distribution discrepancies at the sensor level, to reduce domain discrepancies for MTS data. To address this issue, a practical domain adaptation scenario is formulated as Multivariate Time-Series Unsupervised Domain Adaptation (MTS-UDA). In this paper, we propose SEnsor Alignment (SEA) for MTS-UDA, aiming to reduce domain discrepancy at both the local and global sensor levels. At the local sensor level, we design endo-feature alignment, which aligns sensor features and their correlations across domains. To reduce domain discrepancy at the global sensor level, we design exo-feature alignment that enforces restrictions on global sensor features. We further extend SEA to SEA++ by enhancing the endo-feature alignment. Particularly, we incorporate multi-graph-based high-order alignment for both sensor features and their correlations. Extensive empirical results have demonstrated the state-of-the-art performance of our SEA and SEA++ on public MTS datasets for MTS-UDA.

CRJul 8, 2024
A Trustworthy AIoT-enabled Localization System via Federated Learning and Blockchain

Junfei Wang, He Huang, Jingze Feng et al.

There is a significant demand for indoor localization technology in smart buildings, and the most promising solution in this field is using RF sensors and fingerprinting-based methods that employ machine learning models trained on crowd-sourced user data gathered from IoT devices. However, this raises security and privacy issues in practice. Some researchers propose to use federated learning to partially overcome privacy problems, but there still remain security concerns, e.g., single-point failure and malicious attacks. In this paper, we propose a framework named DFLoc to achieve precise 3D localization tasks while considering the following two security concerns. Particularly, we design a specialized blockchain to decentralize the framework by distributing the tasks such as model distribution and aggregation which are handled by a central server to all clients in most previous works, to address the issue of the single-point failure for a reliable and accurate indoor localization system. Moreover, we introduce an updated model verification mechanism within the blockchain to alleviate the concern of malicious node attacks. Experimental results substantiate the framework's capacity to deliver accurate 3D location predictions and its superior resistance to the impacts of single-point failure and malicious attacks when compared to conventional centralized federated learning systems.

LGSep 29, 2024
Temporal Source Recovery for Time-Series Source-Free Unsupervised Domain Adaptation

Yucheng Wang, Peiliang Gong, Min Wu et al.

Time-Series (TS) data has grown in importance with the rise of Internet of Things devices like sensors, but its labeling remains costly and complex. While Unsupervised Domain Adaptation (UDAs) offers an effective solution, growing data privacy concerns have led to the development of Source-Free UDA (SFUDAs), enabling model adaptation to target domains without accessing source data. Despite their potential, applying existing SFUDAs to TS data is challenging due to the difficulty of transferring temporal dependencies, an essential characteristic of TS data, particularly in the absence of source samples. Although prior works attempt to address this by specific source pretraining designs, such requirements are often impractical, as source data owners cannot be expected to adhere to particular pretraining schemes. To address this, we propose Temporal Source Recovery (TemSR), a framework that leverages the intrinsic properties of TS data to generate a source-like domain and recover source temporal dependencies. With this domain, TemSR enables dependency transfer to the target domain without accessing source data or relying on source-specific designs, thereby facilitating effective and practical TS-SFUDA. TemSR features a masking recovery optimization process to generate a source-like distribution with restored temporal dependencies. This distribution is further refined through local context-aware regularization to preserve local dependencies, and anchor-based recovery diversity maximization to promote distributional diversity. Together, these components enable effective temporal dependency recovery and facilitate transfer across domains using standard UDA techniques. Extensive experiments across multiple TS tasks demonstrate the effectiveness of TemSR, which even surpasses existing TS-SFUDA methods that require source-specific designs.

ROMar 18
OmniVLN: Omnidirectional 3D Perception and Token-Efficient LLM Reasoning for Visual-Language Navigation across Air and Ground Platforms

Zhongyuang Liu, Min He, Shaonan Yu et al.

Language-guided embodied navigation requires an agent to interpret object-referential instructions, search across multiple rooms, localize the referenced target, and execute reliable motion toward it. Existing systems remain limited in real indoor environments because narrow field-of-view sensing exposes only a partial local scene at each step, often forcing repeated rotations, delaying target discovery, and producing fragmented spatial understanding; meanwhile, directly prompting LLMs with dense 3D maps or exhaustive object lists quickly exceeds the context budget. We present OmniVLN, a zero-shot visual-language navigation framework that couples omnidirectional 3D perception with token-efficient hierarchical reasoning for both aerial and ground robots. OmniVLN fuses a rotating LiDAR and panoramic vision into a hardware-agnostic mapping stack, incrementally constructs a five-layer Dynamic Scene Graph (DSG) from mesh geometry to room- and building-level structure, and stabilizes high-level topology through persistent-homology-based room partitioning and hybrid geometric/VLM relation verification. For navigation, the global DSG is transformed into an agent-centric 3D octant representation with multi-resolution spatial attention prompting, enabling the LLM to progressively filter candidate rooms, infer egocentric orientation, localize target objects, and emit executable navigation primitives while preserving fine local detail and compact long-range memory. Experiments show that the proposed hierarchical interface improves spatial referring accuracy from 77.27\% to 93.18\%, reduces cumulative prompt tokens by up to 61.7\% in cluttered multi-room settings, and improves navigation success by up to 11.68\% over a flat-list baseline. We will release the code and an omnidirectional multimodal dataset to support reproducible research.

ROApr 4
Watch Your Step: Learning Semantically-Guided Locomotion in Cluttered Environment

Denan Liang, Yuan Zhu, Ruimeng Liu et al.

Although legged robots demonstrate impressive mobility on rough terrain, using them safely in cluttered environments remains a challenge. A key issue is their inability to avoid stepping on low-lying objects, such as high-cost small devices or cables on flat ground. This limitation arises from a disconnection between high-level semantic understanding and low-level control, combined with errors in elevation maps during real-world operation. To address this, we introduce SemLoco, a Reinforcement Learning (RL) framework designed to avoid obstacles precisely in densely cluttered environments. SemLoco uses a two-stage RL approach that combines both soft and hard constraints. It performs pixel-wise foothold safety inference, which enables more accurate foot placement. Additionally, SemLoco integrates semantic map, allowing it to assign traversability costs instead of relying only on geometric data. SemLoco greatly reduces collisions and improves safety around sensitive objects, enabling reliable navigation in situations where traditional controllers would likely cause damage. Experimental results further show that SemLoco can be effectively applied to more complex, unstructured real-world environments. A demo video can be view at https://youtu.be/FSq-RSmIxOM.

ROFeb 6, 2024Code
MMAUD: A Comprehensive Multi-Modal Anti-UAV Dataset for Modern Miniature Drone Threats

Shenghai Yuan, Yizhuo Yang, Thien Hoang Nguyen et al.

In response to the evolving challenges posed by small unmanned aerial vehicles (UAVs), which possess the potential to transport harmful payloads or independently cause damage, we introduce MMAUD: a comprehensive Multi-Modal Anti-UAV Dataset. MMAUD addresses a critical gap in contemporary threat detection methodologies by focusing on drone detection, UAV-type classification, and trajectory estimation. MMAUD stands out by combining diverse sensory inputs, including stereo vision, various Lidars, Radars, and audio arrays. It offers a unique overhead aerial detection vital for addressing real-world scenarios with higher fidelity than datasets captured on specific vantage points using thermal and RGB. Additionally, MMAUD provides accurate Leica-generated ground truth data, enhancing credibility and enabling confident refinement of algorithms and models, which has never been seen in other datasets. Most existing works do not disclose their datasets, making MMAUD an invaluable resource for developing accurate and efficient solutions. Our proposed modalities are cost-effective and highly adaptable, allowing users to experiment and implement new UAV threat detection tools. Our dataset closely simulates real-world scenarios by incorporating ambient heavy machinery sounds. This approach enhances the dataset's applicability, capturing the exact challenges faced during proximate vehicular operations. It is expected that MMAUD can play a pivotal role in advancing UAV threat detection, classification, trajectory estimation capabilities, and beyond. Our dataset, codes, and designs will be available in https://github.com/ntu-aris/MMAUD.

ROApr 23
A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration

Kuan Xu, Ruimeng Liu, Yizhuo Yang et al.

Bridging the gap between embodied intelligence and embedded deployment remains a key challenge in intelligent robotic systems, where perception, reasoning, and planning must operate under strict constraints on computation, memory, energy, and real-time execution. In vision-language navigation (VLN), existing approaches often face a fundamental trade-off between strong reasoning capabilities and efficient deployment on real-world platforms. In this paper, we present a deployable embodied VLN system that achieves both high efficiency and robust high-level reasoning on real-world robotic platforms. To achieve this, we decouple the system into three asynchronous modules: a real-time perception module for continuous environment sensing, a memory integration module for spatial-semantic aggregation, and a reasoning module for high-level decision making. We incrementally construct a cognitive memory graph to encode scene information, which is further decomposed into subgraphs to enable reasoning with a vision-language model (VLM). To further improve navigation efficiency and accuracy, we also leverage the cognitive memory graph to formulate the exploration problem as a context-aware Weighted Traveling Repairman Problem (WTRP), which minimizes the weighted waiting time of viewpoints. Extensive experiments in both simulation and real-world robotic platforms demonstrate improved navigation success and efficiency over existing VLN approaches, while maintaining real-time performance on resource-constrained hardware.

AIMar 18
RPMS: Enhancing LLM-Based Embodied Planning through Rule-Augmented Memory Synergy

Zhenhang Yuan, Shenghai Yuan, Lihua Xie

LLM agents often fail in closed-world embodied environments because actions must satisfy strict preconditions -- such as location, inventory, and container states -- and failure feedback is sparse. We identify two structurally coupled failure modes: (P1) invalid action generation and (P2) state drift, each amplifying the other in a degenerative cycle. We present RPMS, a conflict-managed architecture that enforces action feasibility via structured rule retrieval, gates memory applicability via a lightweight belief state, and resolves conflicts between the two sources via rules-first arbitration. On ALFWorld (134 unseen tasks), RPMS achieves 59.7% single-trial success with Llama 3.1 8B (+23.9 pp over baseline) and 98.5% with Claude Sonnet 4.5 (+11.9 pp); of the 8B gain, rule retrieval alone contributes +14.9 pp (statistically significant), making it the dominant factor. A key finding is that episodic memory is conditionally useful: it harms performance on some task types when used without grounding, but becomes a stable net positive once filtered by current state and constrained by explicit action rules. Adapting RPMS to ScienceWorld with GPT-4 yields consistent gains across all ablation conditions (avg. score 54.0 vs. 44.9 for the ReAct baseline), providing transfer evidence that the core mechanisms hold across structurally distinct environments.

ROMar 29
S3KF: Spherical State-Space Kalman Filtering for Panoramic 3D Multi-Object Tracking

Zhongyuan Liu, Shaonan Yu, Jianping Li et al.

Panoramic multi-object tracking is important for industrial safety monitoring, wide-area robotic perception, and infrastructure-light deployment in large workspaces. In these settings, the sensing system must provide full-surround coverage, metric geometric cues, and stable target association under wide field-of-view distortion and occlusion. Existing image-plane trackers are tightly coupled to the camera projection and become unreliable in panoramic imagery, while conventional Euclidean 3D formulations introduce redundant directional parameters and do not naturally unify angular, scale, and depth estimation. In this paper, we present $\mathbf{S^3KF}$, a panoramic 3D multi-object tracking framework built on a motorized rotating LiDAR and a quad-fisheye camera rig. The key idea is a geometry-consistent state representation on the unit sphere $\mathbb{S}^2$, where object bearing is modeled by a two-degree-of-freedom tangent-plane parameterization and jointly estimated with box scale and depth dynamics. Based on this state, we derive an extended spherical Kalman filtering pipeline that fuses panoramic camera detections with LiDAR depth observations for multimodal tracking. We further establish a map-based ground-truth generation pipeline using wearable localization devices registered to a shared global LiDAR map, enabling quantitative evaluation without motion-capture infrastructure. Experiments on self-collected real-world sequences show decimeter-level planar tracking accuracy, improved identity continuity over a 2D panoramic baseline in dynamic scenes, and real-time onboard operation on a Jetson AGX Orin platform. These results indicate that the proposed framework is a practical solution for panoramic perception and industrial-scale multi-object tracking.The project page can be found at https://kafeiyin00.github.io/S3KF/.

ROMar 27
Line-of-Sight-Constrained Multi-Robot Mapless Navigation via Polygonal Visible Regions

Ruofei Bai, Shenghai Yuan, Xinhang Xu et al.

Multi-robot systems rely on underlying connectivity to ensure reliable communication and timely coordination. This paper studies the line-of-sight (LoS) connectivity maintenance problem in multi-robot navigation with unknown obstacles. Prior works typically assume known environment maps to formulate LoS constraints between robots, which hinders their practical deployment. To overcome this limitation, we propose an inherently distributed approach where each robot only constructs an egocentric visible region based on its real-time LiDAR scans, instead of endeavoring to build a global map online. The individual visible regions are shared through distributed communication to establish inter-robot LoS constraints, which are then incorporated into a multi-robot navigation framework to ensure LoS-connectivity. Moreover, we enhance the robustness of connectivity maintenance by proposing a more accurate LoS-distance metric, which further enables flexible topology optimization that eliminates redundant and effort-demanding connections. The proposed framework is evaluated through extensive multi-robot navigation and exploration tasks in both simulation and real-world experiments. Results show that it reliably maintains LoS-connectivity between robots in challenging environments cluttered with obstacles, even under large visible ranges and fragile minimal topologies, where existing methods consistently fail. Ablation studies also reveal that topology optimization boosts navigation efficiency by around $20\%$, demonstrating the framework's potential for efficient navigation under connectivity constraints.

CVJun 23, 2025Code
MCN-SLAM: Multi-Agent Collaborative Neural SLAM with Hybrid Implicit Neural Scene Representation

Tianchen Deng, Guole Shen, Xun Chen et al.

Neural implicit scene representations have recently shown promising results in dense visual SLAM. However, existing implicit SLAM algorithms are constrained to single-agent scenarios, and fall difficulties in large-scale scenes and long sequences. Existing NeRF-based multi-agent SLAM frameworks cannot meet the constraints of communication bandwidth. To this end, we propose the first distributed multi-agent collaborative neural SLAM framework with hybrid scene representation, distributed camera tracking, intra-to-inter loop closure, and online distillation for multiple submap fusion. A novel triplane-grid joint scene representation method is proposed to improve scene reconstruction. A novel intra-to-inter loop closure method is designed to achieve local (single-agent) and global (multi-agent) consistency. We also design a novel online distillation method to fuse the information of different submaps to achieve global consistency. Furthermore, to the best of our knowledge, there is no real-world dataset for NeRF-based/GS-based SLAM that provides both continuous-time trajectories groundtruth and high-accuracy 3D meshes groundtruth. To this end, we propose the first real-world Dense slam (DES) dataset covering both single-agent and multi-agent scenarios, ranging from small rooms to large-scale outdoor scenes, with high-accuracy ground truth for both 3D mesh and continuous-time camera trajectory. This dataset can advance the development of the research in both SLAM, 3D reconstruction, and visual foundation model. Experiments on various datasets demonstrate the superiority of the proposed method in both mapping, tracking, and communication. The dataset and code will open-source on https://github.com/dtc111111/mcnslam.

ROMay 14
FU-MPC: Frontier- and Uncertainty-Aware Model Predictive Control for Efficient and Accurate UAV Exploration with Motorized LiDAR

Jianping Li, Pengfei Wan, Zhongyuan Liu et al.

Efficient UAV exploration in unknown environments requires rapid coverage expansion while maintaining accurate and reliable localization, since safe navigation in complex scenes depends on consistent mapping and pose estimation. However, for conventional LiDAR-equipped UAVs, the observable region is tightly coupled with the UAV pose and motion. Expanding coverage often requires additional translational or rotational maneuvers, which can reduce exploration efficiency and increase the risk of localization degradation in geometrically challenging environments. Motorized rotating LiDARs provide a promising solution by actively adjusting the sensor viewing direction without changing the UAV motion, thereby introducing an additional sensing degree of freedom. Nevertheless, existing exploration systems rarely exploit this scanning freedom as an explicit decision variable linked to both exploration progress and localization quality. To address this gap, we develop a UAV platform equipped with an independently actuated rotating LiDAR and propose a hierarchical exploration framework. The global planner organizes frontiers into representative viewpoints and sequences them using topology-aware transition costs. Built upon this planner, FU-MPC serves as a local receding-horizon scan controller that optimizes LiDAR rotation along the predicted flight trajectory. The controller jointly considers frontier-aware exploration utility and direction-dependent localization uncertainty, while lightweight surrogate evaluation enables real-time onboard execution. Experiments in complex environments demonstrate that the proposed system improves exploration efficiency while maintaining robust localization performance compared with fixed-pattern scanning and uncertainty-only baselines. The project page can be found at https://kafeiyin00.github.io/FU-MPC/.

CVJul 30, 2025Code
UAVScenes: A Multi-Modal Dataset for UAVs

Sijie Wang, Siqi Li, Yawei Zhang et al.

Multi-modal perception is essential for unmanned aerial vehicle (UAV) operations, as it enables a comprehensive understanding of the UAVs' surrounding environment. However, most existing multi-modal UAV datasets are primarily biased toward localization and 3D reconstruction tasks, or only support map-level semantic segmentation due to the lack of frame-wise annotations for both camera images and LiDAR point clouds. This limitation prevents them from being used for high-level scene understanding tasks. To address this gap and advance multi-modal UAV perception, we introduce UAVScenes, a large-scale dataset designed to benchmark various tasks across both 2D and 3D modalities. Our benchmark dataset is built upon the well-calibrated multi-modal UAV dataset MARS-LVIG, originally developed only for simultaneous localization and mapping (SLAM). We enhance this dataset by providing manually labeled semantic annotations for both frame-wise images and LiDAR point clouds, along with accurate 6-degree-of-freedom (6-DoF) poses. These additions enable a wide range of UAV perception tasks, including segmentation, depth estimation, 6-DoF localization, place recognition, and novel view synthesis (NVS). Our dataset is available at https://github.com/sijieaaa/UAVScenes

ROMar 8Code
PanoDP: Learning Collision-Free Navigation with Panoramic Depth and Differentiable Physics

Hao Zhong, Pei Chi, Jiang Zhao et al.

Autonomous collision-free navigation in cluttered environments requires safe decision-making under partial observability with both static structure and dynamic obstacles. We present \textbf{PanoDP}, a communication-free learning framework that combines four-view panoramic depth perception with differentiable-physics-based training signals. PanoDP encodes panoramic depth using a lightweight CNN and optimizes policies with dense differentiable collision and motion-feasibility terms, improving training stability beyond sparse terminal collisions. We evaluate PanoDP on a controlled ring-to-center benchmark with systematic sweeps over agent count, obstacle density/layout, and dynamic behaviors, and further test out-of-distribution generalization in an external simulator (e.g., AirSim). Across settings, PanoDP increases collision-free and completion rates over single-view and non-physics-guided baselines under matched training budgets, and ablations (view masking, rotation augmentation) confirm the policy leverages 360-degree information. Code will be open source upon acceptance.

OCSep 10, 2021Code
DIRECT: A Differential Dynamic Programming Based Framework for Trajectory Generation

Kun Cao, Muqing Cao, Shenghai Yuan et al.

This paper introduces a differential dynamic programming (DDP) based framework for polynomial trajectory generation for differentially flat systems. In particular, instead of using a linear equation with increasing size to represent multiple polynomial segments as in literature, we take a new perspective from state-space representation such that the linear equation reduces to a finite horizon control system with a fixed state dimension and the required continuity conditions for consecutive polynomials are automatically satisfied. Consequently, the constrained trajectory generation problem (both with and without time optimization) can be converted to a discrete-time finite-horizon optimal control problem with inequality constraints, which can be approached by a recently developed interior-point DDP (IPDDP) algorithm. Furthermore, for unconstrained trajectory generation with preallocated time, we show that this problem is indeed a linear-quadratic tracking (LQT) problem (DDP algorithm with exact one iteration). All these algorithms enjoy linear complexity with respect to the number of segments. Both numerical comparisons with state-of-the-art methods and physical experiments are presented to verify and validate the effectiveness of our theoretical findings. The implementation code will be open-sourced,

ROFeb 7, 2021Code
Lightweight 3-D Localization and Mapping for Solid-State LiDAR

Han Wang, Chen Wang, Lihua Xie

The LIght Detection And Ranging (LiDAR) sensor has become one of the most important perceptual devices due to its important role in simultaneous localization and mapping (SLAM). Existing SLAM methods are mainly developed for mechanical LiDAR sensors, which are often adopted by large scale robots. Recently, the solid-state LiDAR is introduced and becomes popular since it provides a cost-effective and lightweight solution for small scale robots. Compared to mechanical LiDAR, solid-state LiDAR sensors have higher update frequency and angular resolution, but also have smaller field of view (FoV), which is very challenging for existing LiDAR SLAM algorithms. Therefore, it is necessary to have a more robust and computationally efficient SLAM method for this new sensing device. To this end, we propose a new SLAM framework for solid-state LiDAR sensors, which involves feature extraction, odometry estimation, and probability map building. The proposed method is evaluated on a warehouse robot and a hand-held device. In the experiments, we demonstrate both the accuracy and efficiency of our method using an Intel L515 solid-state LiDAR. The results show that our method is able to provide precise localization and high quality mapping. We made the source codes public at \url{https://github.com/wh200720041/SSL_SLAM}.

CVJul 29, 2020Code
Online Visual Place Recognition via Saliency Re-identification

Han Wang, Chen Wang, Lihua Xie

As an essential component of visual simultaneous localization and mapping (SLAM), place recognition is crucial for robot navigation and autonomous driving. Existing methods often formulate visual place recognition as feature matching, which is computationally expensive for many robotic applications with limited computing power, e.g., autonomous driving and cleaning robot. Inspired by the fact that human beings always recognize a place by remembering salient regions or landmarks that are more attractive or interesting than others, we formulate visual place recognition as saliency re-identification. In the meanwhile, we propose to perform both saliency detection and re-identification in frequency domain, in which all operations become element-wise. The experiments show that our proposed method achieves competitive accuracy and much higher speed than the state-of-the-art feature-based methods. The proposed method is open-sourced and available at https://github.com/wh200720041/SRLCD.git.

ROOct 16, 2017Code
GroundSLAM: A Robust Visual SLAM System for Warehouse Robots Using Ground Textures

Kuan Xu, Zheng Yang, Lihua Xie et al.

A robust visual localization and mapping system is essential for warehouse robot navigation, as cameras offer a more cost-effective alternative to LiDAR sensors. However, existing forward-facing camera systems often encounter challenges in dynamic environments and open spaces, leading to significant performance degradation during deployment. To address these limitations, a localization system utilizing a single downward-facing camera to capture ground textures presents a promising solution. Nevertheless, existing feature-based ground-texture localization methods face difficulties when operating on surfaces with sparse features or repetitive patterns. To address this limitation, we propose GroundSLAM, a novel feature-free and ground-texture-based simultaneous localization and mapping (SLAM) system. GroundSLAM consists of three components: feature-free visual odometry, ground-texture-based loop detection and map optimization, and map reuse. Specifically, we introduce a kernel cross-correlator (KCC) for image-level pose tracking, loop detection, and map reuse to improve localization accuracy and robustness, and incorporate adaptive pruning strategies to enhance efficiency. Due to these specific designs, GroundSLAM is able to deliver efficient and stable localization across various ground surfaces such as those with sparse features and repetitive patterns. To advance research in this area, we introduce the first ground-texture dataset with precise ground-truth poses, consisting of 131k images collected from 10 kinds of indoor and outdoor ground surfaces. Extensive experimental results show that GroundSLAM outperforms state-of-the-art methods for both indoor and outdoor localization. We release our code and dataset at https://github.com/sair-lab/GroundSLAM.