CVNov 6, 2025
FastGS: Training 3D Gaussian Splatting in 100 SecondsShiwei Ren, Tianci Wen, Yongchun Fang et al.
The dominant 3D Gaussian splatting (3DGS) acceleration methods fail to properly regulate the number of Gaussians during training, causing redundant computational time overhead. In this paper, we propose FastGS, a novel, simple, and general acceleration framework that fully considers the importance of each Gaussian based on multi-view consistency, efficiently solving the trade-off between training time and rendering quality. We innovatively design a densification and pruning strategy based on multi-view consistency, dispensing with the budgeting mechanism. Extensive experiments on Mip-NeRF 360, Tanks & Temples, and Deep Blending datasets demonstrate that our method significantly outperforms the state-of-the-art methods in training speed, achieving a 3.32$\times$ training acceleration and comparable rendering quality compared with DashGaussian on the Mip-NeRF 360 dataset and a 15.45$\times$ acceleration compared with vanilla 3DGS on the Deep Blending dataset. We demonstrate that FastGS exhibits strong generality, delivering 2-7$\times$ training acceleration across various tasks, including dynamic scene reconstruction, surface reconstruction, sparse-view reconstruction, large-scale reconstruction, and simultaneous localization and mapping. The project page is available at https://fastgs.github.io/
60.1CVMay 16
Mind the Gap: Learning Modality-Agnostic Representations with a Cross-Modality UNetXin Niu, Enyi Li, Jinchao Liu et al.
Cross-modality recognition has many important applications in science, law enforcement and entertainment. Popular methods to bridge the modality gap include reducing the distributional differences of representations of different modalities, learning indistinguishable representations or explicit modality transfer. The first two approaches suffer from the loss of discriminant information while removing the modality-specific variations. The third one heavily relies on the successful modality transfer, could face catastrophic performance drop when explicit modality transfers are not possible or difficult. To tackle this problem, we proposed a compact encoder-decoder neural module (cmUNet) to learn modality-agnostic representations while retaining identity-related information. This is achieved through cross-modality transformation and in-modality reconstruction, enhanced by an adversarial/perceptual loss which encourages indistinguishability of representations in the original sample space. For cross-modality matching, we propose MarrNet where cmUNet is connected to a standard feature extraction network which takes as inputs the modality-agnostic representations and outputs similarity scores for matching. We validated our method on five challenging tasks, namely Raman-infrared spectrum matching, cross-modality person re-identification and heterogeneous (photo-sketch, visible-near infrared and visible-thermal) face recognition, where MarrNet showed superior performance compared to state-of-the-art methods. Furthermore, it is observed that a cross-modality matching method could be biased to extract discriminant information from partial or even wrong regions, due to incompetence of dealing with modality gaps, which subsequently leads to poor generalization. We show that robustness to occlusions can be an indicator of whether a method can well bridge the modality gap.
AIFeb 16, 2025Code
Hierarchical Expert Prompt for Large-Language-Model: An Approach Defeat Elite AI in TextStarCraft II for the First TimeZongyuan Li, Chang Lu, Xiaojie Xu et al.
Since the emergence of the Large Language Model (LLM), LLM has been widely used in fields such as writing, translating, and searching. However, there is still great potential for LLM-based methods in handling complex tasks such as decision-making in the StarCraft II environment. To address problems such as lack of relevant knowledge and poor control over subtasks of varying importance, we propose a Hierarchical Expert Prompt (HEP) for LLM. Our method improves the understanding of game situations through expert-level tactical knowledge, improving the processing quality of tasks of varying importance through a hierarchical framework. Our approach defeated the highest level (Elite) standard built-in agent in TextStarCraft II for the first time and consistently outperformed the baseline method in other difficulties. Our experiments suggest that the proposed method is a practical solution for tackling complex decision-making challenges. The replay video can be viewed on https://www.bilibili.com/video/BV1uz42187EF and https://youtu.be/dO3PshWLV5M, and our codes have been open-sourced on https://github.com/luchang1113/HEP-LLM-play-StarCraftII.
20.1ROApr 18
Neural Network-Based Adaptive Event-Triggered Control for Dual-Arm Unmanned Aerial Manipulator SystemsYang Wang, Hai Yu, Wei He et al.
This paper investigates the control problem of dual-arm unmanned aerial manipulator systems (DAUAMs). Strong coupling between the dual-arm and the multirotor platform, together with unmodeled dynamics and external disturbances, poses significant challenges to stable and accurate operation. An adaptive event-triggered control scheme with neural network-based approximation is proposed to address these issues while explicitly considering communication constraints. First, a dynamic model of the DAUAM system is derived, and a command-filter-based backstepping framework with error compensation is constructed. Then, a neural network is employed to approximate external frictions, and an event-triggered mechanism is designed to reduce the transmission frequency of control updates, thereby alleviating communication and energy burdens. Lyapunov-based analysis shows that all closed-loop signals remain bounded and that the tracking error converges to a neighborhood of the desired trajectory within a fixed time. Finally, experiments on a self-built DAUAM platform demonstrate that the proposed approach achieves accurate trajectory tracking.
RONov 1, 2020Code
MRPB 1.0: A Unified Benchmark for the Evaluation of Mobile Robot Local Planning ApproachesJian Wen, Xuebo Zhang, Qingchen Bi et al.
Local planning is one of the key technologies for mobile robots to achieve full autonomy and has been widely investigated. To evaluate mobile robot local planning approaches in a unified and comprehensive way, a mobile robot local planning benchmark called MRPB 1.0 is newly proposed in this paper. The benchmark facilitates both motion planning researchers who want to compare the performance of a new local planner relative to many other state-of-the-art approaches as well as end users in the mobile robotics industry who want to select a local planner that performs best on some problems of interest. We elaborately design various simulation scenarios to challenge the applicability of local planners, including large-scale, partially unknown, and dynamic complex environments. Furthermore, three types of principled evaluation metrics are carefully designed to quantitatively evaluate the performance of local planners, wherein the safety, efficiency, and smoothness of motions are comprehensively considered. We present the application of the proposed benchmark in two popular open-source local planners to show the practicality of the benchmark. In addition, some insights and guidelines about the design and selection of local planners are also provided. The benchmark website contains all data of the designed simulation scenarios, detailed descriptions of these scenarios, and example code.
3.5SYMay 5
Differentiable Optimization Layered Safety-Critical Control for Risk-Aware Navigation via Conformal PredictionJinyang Dong, Shizhen Wu, Yongchun Fang
Risk-aware navigation in unknown environments is a fundamental challenge for autonomous vehicles operating in complex urban systems. To address this issue, this paper presents a differentiable optimization layered safety-critical control method based on conformal prediction. First, to handle uncertainties arising from sensor noise, the conformal prediction method is employed to generate risk-aware obstacle ellipsoids around an elliptical-shaped robot. Second, two nested differentiable optimization layers are introduced to build the control barrier functions for obstacle avoidance and feasibility guarantee, respectively. Then, a quadratic program based safety-critical control law is proposed to integrate the above control barrier function constraints as well as input constraints. In the end, the effectiveness of the proposed framework is demonstrated through numerical simulations.
CVJan 9, 2025
SEGS-SLAM: Structure-enhanced 3D Gaussian Splatting SLAM with Appearance EmbeddingTianci Wen, Zhiang Liu, Yongchun Fang
3D Gaussian splatting (3D-GS) has recently revolutionized novel view synthesis in the simultaneous localization and mapping (SLAM) problem. However, most existing algorithms fail to fully capture the underlying structure, resulting in structural inconsistency. Additionally, they struggle with abrupt appearance variations, leading to inconsistent visual quality. To address these problems, we propose SEGS-SLAM, a structure-enhanced 3D Gaussian Splatting SLAM, which achieves high-quality photorealistic mapping. Our main contributions are two-fold. First, we propose a structure-enhanced photorealistic mapping (SEPM) framework that, for the first time, leverages highly structured point cloud to initialize structured 3D Gaussians, leading to significant improvements in rendering quality. Second, we propose Appearance-from-Motion embedding (AfME), enabling 3D Gaussians to better model image appearance variations across different camera poses. Extensive experiments on monocular, stereo, and RGB-D datasets demonstrate that SEGS-SLAM significantly outperforms state-of-the-art (SOTA) methods in photorealistic mapping quality, e.g., an improvement of $19.86\%$ in PSNR over MonoGS on the TUM RGB-D dataset for monocular cameras. The project page is available at https://segs-slam.github.io/.
AIFeb 19, 2025
Reflection of Episodes: Learning to Play Game from Expert and Self ExperiencesXiaojie Xu, Zongyuan Li, Chang Lu et al.
StarCraft II is a complex and dynamic real-time strategy (RTS) game environment, which is very suitable for artificial intelligence and reinforcement learning research. To address the problem of Large Language Model(LLM) learning in complex environments through self-reflection, we propose a Reflection of Episodes(ROE) framework based on expert experience and self-experience. This framework first obtains key information in the game through a keyframe selection method, then makes decisions based on expert experience and self-experience. After a game is completed, it reflects on the previous experience to obtain new self-experience. Finally, in the experiment, our method beat the robot under the Very Hard difficulty in TextStarCraft II. We analyze the data of the LLM in the process of the game in detail, verified its effectiveness.
ROJan 31, 2022
G$ \mathbf{^2} $VD Planner: Efficient Motion Planning With Grid-based Generalized Voronoi DiagramsJian Wen, Xuebo Zhang, Qingchen Bi et al.
In this paper, an efficient motion planning approach with grid-based generalized Voronoi diagrams (G$ \mathbf{^2} $VD) is newly proposed for mobile robots. Different from existing approaches, the novelty of this work is twofold: 1) a new state lattice-based path searching approach is proposed, in which the search space is reduced to a novel Voronoi corridor to further improve the search efficiency; 2) an efficient quadratic programming-based path smoothing approach is presented, wherein the clearance to obstacles is considered to improve the path clearance of hard-constrained path smoothing approaches. We validate the efficiency and smoothness of our approach in various challenging simulation scenarios and outdoor environments. It is shown that the computational efficiency is improved by 17.1% in the path searching stage, and path smoothing with the proposed approach is 6.6 times faster than an advanced sparse-banded structure-based path smoothing approach and 53.3 times faster than the popular timed-elastic-band planner. A video showing outdoor navigation on our campus is available at https://youtu.be/iMXGthgvp58.
RODec 16, 2020
E$ \mathbf{^3} $MoP: Efficient Motion Planning Based on Heuristic-Guided Motion Primitives Pruning and Path Optimization With Sparse-Banded StructureJian Wen, Xuebo Zhang, Haiming Gao et al.
To solve the autonomous navigation problem in complex environments, an efficient motion planning approach is newly presented in this paper. Considering the challenges from large-scale, partially unknown complex environments, a three-layer motion planning framework is elaborately designed, including global path planning, local path optimization, and time-optimal velocity planning. Compared with existing approaches, the novelty of this work is twofold: 1) a novel heuristic-guided pruning strategy of motion primitives is proposed and fully integrated into the state lattice-based global path planner to further improve the computational efficiency of graph search, and 2) a new soft-constrained local path optimization approach is proposed, wherein the sparse-banded system structure of the underlying optimization problem is fully exploited to efficiently solve the problem. We validate the safety, smoothness, flexibility, and efficiency of our approach in various complex simulation scenarios and challenging real-world tasks. It is shown that the computational efficiency is improved by 66.21% in the global planning stage and the motion efficiency of the robot is improved by 22.87% compared with the recent quintic Bézier curve-based state space sampling approach. We name the proposed motion planning framework E$ \mathrm{^3} $MoP, where the number 3 not only means our approach is a three-layer framework but also means the proposed approach is efficient in three stages.
OCSep 9, 2020
Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized OptimizationHuan Li, Zhouchen Lin, Yongchun Fang
We study stochastic decentralized optimization for the problem of training machine learning models with large-scale distributed data. We extend the widely used EXTRA and DIGing methods with variance reduction (VR), and propose two methods: VR-EXTRA and VR-DIGing. The proposed VR-EXTRA requires the time of $O((κ_s+n)\log\frac{1}ε)$ stochastic gradient evaluations and $O((κ_b+κ_c)\log\frac{1}ε)$ communication rounds to reach precision $ε$, which are the best complexities among the non-accelerated gradient-type methods, where $κ_s$ and $κ_b$ are the stochastic condition number and batch condition number for strongly convex and smooth problems, respectively, $κ_c$ is the condition number of the communication network, and $n$ is the sample size on each distributed node. The proposed VR-DIGing has a little higher communication cost of $O((κ_b+κ_c^2)\log\frac{1}ε)$. Our stochastic gradient computation complexities are the same as the ones of single-machine VR methods, such as SAG, SAGA, and SVRG, and our communication complexities keep the same as those of EXTRA and DIGing, respectively. To further speed up the convergence, we also propose the accelerated VR-EXTRA and VR-DIGing with both the optimal $O((\sqrt{nκ_s}+n)\log\frac{1}ε)$ stochastic gradient computation complexity and $O(\sqrt{κ_bκ_c}\log\frac{1}ε)$ communication complexity. Our stochastic gradient computation complexity is also the same as the ones of single-machine accelerated VR methods, such as Katyusha, and our communication complexity keeps the same as those of accelerated full batch decentralized methods, such as MSDA.
ROJan 7, 2019
CAE-RLSM: Consistent and Efficient Redundant Line Segment Merging for Online Feature Map BuildingJian Wen, Xuebo Zhang, Haiming Gao et al.
In order to obtain a compact line segment-based map representation for localization and planning of mobile robots, it is necessary to merge redundant line segments which physically represent the same part of the environment in different scans. In this paper, a consistent and efficient redundant line segment merging approach (CAE-RLSM) is proposed for online feature map building. The proposed CAE-RLSM is composed of two newly proposed modules: one-to-many incremental line segment merging (OTM-ILSM) and multi-processing global map adjustment (MP-GMA). Different from state-of-the-art offline merging approaches, the proposed CAE-RLSM can achieve real-time mapping performance, which not only reduces the redundancy of incremental merging with high efficiency, but also solves the problem of global map adjustment after loop closing to guarantee global consistency. Furthermore, a new correlation-based evaluation metric is proposed for the quality evaluation of line segment maps. This evaluation metric does not require manual measurement of the environmental metric information, instead it makes full use of globally consistent laser scans obtained by simultaneous localization and mapping (SLAM) systems to compare the performance of different line segment-based mapping approaches in an objective and fair manner. Comparative experimental results with respect to a mean shift-based offline redundant line segment merging approach (MS-RLSM) and an offline version of one-to-one incremental line segment merging approach (O$^2$TO-ILSM) on both public data sets and self-recorded data set are presented to show the superior performance of CAE-RLSM in terms of efficiency and map quality in different scenarios.
RODec 8, 2018
Real-time Acceleration-continuous Path-constrained Trajectory Planning With Built-in Tradability Between Cruise and Time-optimal MotionsPeiyao Shen, Xuebo Zhang, Yongchun Fang
In this paper, a novel real-time acceleration-continuous path-constrained trajectory planning algorithm is proposed with an appealing built-in tradability mechanism between cruise motion and time-optimal motion. Different from existing approaches, the proposed approach smoothens time-optimal trajectories with bang-bang input structures to generate acceleration-continuous trajectories while preserving the completeness property. More importantly, a novel built-in tradability mechanism is proposed and embedded into the trajectory planning framework, so that the proportion of the cruise motion and time-optimal motion can be flexibly adjusted by changing a user-specified functional parameter. Thus, the user can easily apply the trajectory planning algorithm for various tasks with different requirements on motion efficiency and cruise proportion. Moreover, it is shown that feasible trajectories are computed more quickly than optimal trajectories. Rigorous mathematical analysis and proofs are provided for these aforementioned results. Comparative simulation and experimental results on omnidirectional wheeled mobile robots demonstrate the capability of the proposed algorithm in terms of flexible tunning between cruise and time-optimal motions, as well as higher computational efficiency.
ROOct 10, 2016
Essential Properties of Numerical Integration for Time-optimal Trajectory Planning Along a Specified PathPeiyao Shen, Xuebo Zhang, Yongchun Fang
This letter summarizes some known properties and also presents several new properties of the Numerical Integration (NI) method for time-optimal trajectory planning along a specified path. The contribution is that rigorous mathematical proofs of these properties are presented, most of which cannot be found in existing literatures. We first give some properties regarding switch points and accelerating/decelerating curves of the NI method. Then, for the fact that when kinematic constraints are considered, the original version of NI which only considers torque constraints may result in failure of trajectory planning, we give the concrete failure conditions with rigorous mathematical proof. Accordingly, a failure detection algorithm is given in a run-and-test manner. Some simulation results on a unicycle vehicle are provided to verify those presented properties. Note that though those known properties are not discovered first, their mathematical proofs are given first in this letter. The detailed proofs make the theory of NI more complete and help interested readers to gain a thorough understanding of the method.