CVApr 15, 2022
Condition-Invariant and Compact Visual Place Description by Convolutional AutoencoderHanjing Ye, Weinan Chen, Jingwen Yu et al.
Visual place recognition (VPR) in condition-varying environments is still an open problem. Popular solutions are CNN-based image descriptors, which have been shown to outperform traditional image descriptors based on hand-crafted visual features. However, there are two drawbacks of current CNN-based descriptors: a) their high dimension and b) lack of generalization, leading to low efficiency and poor performance in applications. In this paper, we propose to use a convolutional autoencoder (CAE) to tackle this problem. We employ a high-level layer of a pre-trained CNN to generate features, and train a CAE to map the features to a low-dimensional space to improve the condition invariance property of the descriptor and reduce its dimension at the same time. We verify our method in three challenging datasets involving significant illumination changes, and our method is shown to be superior to the state-of-the-art. For the benefit of the community, we make public the source code.
CVJun 17, 2024
SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic SegmentationZhenchao Lin, Li He, Hongqiang Yang et al.
Large-scale point cloud consists of a multitude of individual objects, thereby encompassing rich structural and underlying semantic contextual information, resulting in a challenging problem in efficiently segmenting a point cloud. Most existing researches mainly focus on capturing intricate local features without giving due consideration to global ones, thus failing to leverage semantic context. In this paper, we propose a Similarity-Weighted Convolution and local-global Fusion Network, named SWCF-Net, which takes into account both local and global features. We propose a Similarity-Weighted Convolution (SWConv) to effectively extract local features, where similarity weights are incorporated into the convolution operation to enhance the generalization capabilities. Then, we employ a downsampling operation on the K and V channels within the attention module, thereby reducing the quadratic complexity to linear, enabling the Transformer to deal with large-scale point clouds. At last, orthogonal components are extracted in the global features and then aggregated with local features, thereby eliminating redundant information between local and global features and consequently promoting efficiency. We evaluate SWCF-Net on large-scale outdoor datasets SemanticKITTI and Toronto3D. Our experimental results demonstrate the effectiveness of the proposed network. Our method achieves a competitive result with less computational cost, and is able to handle large-scale point clouds efficiently.
ROOct 27, 2021
Relationship Oriented Affordance Learning through Manipulation Graph ConstructionChao Tang, Jingwen Yu, Weinan Chen et al.
In this paper, we propose Manipulation Relationship Graph (MRG), a novel affordance representation which captures the underlying manipulation relationships of an arbitrary scene. To construct such a graph from raw visual observations, a deep nerual network named AR-Net is introduced. It consists of an Attribute module and a Context module, which guide the relationship learning at object and subgraph level respectively. We quantitatively validate our method on a novel manipulation relationship dataset named SMRD. To evaluate the performance of the proposed model and representation, both visual perception and physical manipulation experiments are conducted. Overall, AR-Net along with MRG outperforms all baselines, achieving the success rate of 88.89% on task relationship recognition (TRR) and 73.33% on task completion (TC)
ROOct 13, 2021
Robotic Autonomous Trolley Collection with Progressive Perception and Nonlinear Model Predictive ControlAnxing Xiao, Hao Luan, Ziqi Zhao et al.
Autonomous mobile manipulation robots that can collect trolleys are widely used to liberate human resources and fight epidemics. Most prior robotic trolley collection solutions only detect trolleys with 2D poses or are merely based on specific marks and lack the formal design of planning algorithms. In this paper, we present a novel mobile manipulation system with applications in luggage trolley collection. The proposed system integrates a compact hardware design and a progressive perception and planning framework, enabling the system to efficiently and robustly collect trolleys in dynamic and complex environments. For the perception, we first develop a 3D trolley detection method that combines object detection and keypoint estimation. Then, a docking process in a short distance is achieved with an accurate point cloud plane detection method and a novel manipulator design. On the planning side, we formulate the robot's motion planning under a nonlinear model predictive control framework with control barrier functions to improve obstacle avoidance capabilities while maintaining the target in the sensors' field of view at close distances. We demonstrate our design and framework by deploying the system on actual trolley collection tasks, and their effectiveness and robustness are experimentally validated.
ROJun 3, 2021
Curiosity-based Robot Navigation under Uncertainty in Crowded EnvironmentsKuanqi Cai, Weinan Chen, Chaoqun Wang et al.
Mobile robots have become more and more popular in large-scale and crowded environments, such as airports, shopping malls, etc. However, due to sparse landmarks and crowd noise, localization in this environment is a great challenge. Furthermore, it is unreliable for the robot to navigate safely in crowds while considering human comfort. Thus, how to navigate safely with localization precision in that environment is a critical problem. To solve this problem, we proposed a curiosity-based framework that can find an effective path with the consideration of human comfort and crowds, localization uncertainty, and the cost-to-go to the target. Three parts are involved in the proposed framework: the distance assessment module, the Curiosity for Positive Content (CPC), namely information-rich areas, and the Curiosity for Negative Content (CNC), namely crowded areas. CPC is introduced when the real-time localization uncertainty evaluation is not satisfied. This factor is predicted through the propagation of uncertainty along the candidate trajectory to provoke the robot to approach localization-referenced landmarks. The Human Comfort and Crowd Density Map (HCCDM) based on the Gaussian Mixture Model (GMM) is established to calculate CNC, which drives the robot to bypass the crowd and consider human comfort. The evaluation is conducted in a series of large-scale and crowded environments. The results show that our method can find a feasible path that can consider the localization uncertainty while simultaneously avoiding the crowded area.
ROJul 3, 2018
Submap-based Pose-graph Visual SLAM: A Robust Visual Exploration and Localization SystemWeinan Chen, Lei Zhu, Yisheng Guan et al.
For VSLAM (Visual Simultaneous Localization and Mapping), localization is a challenging task, especially for some challenging situations: textureless frames, motion blur, etc.. To build a robust exploration and localization system in a given space or environment, a submap-based VSLAM system is proposed in this paper. Our system uses a submap back-end and a visual front-end. The main advantage of our system is its robustness with respect to tracking failure, a common problem in current VSLAM algorithms. The robustness of our system is compared with the state-of-the-art in terms of average tracking percentage. The precision of our system is also evaluated in terms of ATE (absolute trajectory error) RMSE (root mean square error) comparing the state-of-the-art. The ability of our system in solving the `kidnapped' problem is demonstrated. Our system can improve the robustness of visual localization in challenging situations.