ROJun 6, 2023
Single-Shot Global Localization via Graph-Theoretic Correspondence MatchingShigemichi Matsuzaki, Kenji Koide, Shuji Oishi et al.
This paper describes a method of global localization based on graph-theoretic association of instances between a query and the prior map. The proposed framework employs correspondence matching based on the maximum clique problem (MCP). The framework is potentially applicable to other map and/or query modalities thanks to the graph-based abstraction of the problem, while many of existing global localization methods rely on a query and the dataset in the same modality. We implement it with a semantically labeled 3D point cloud map, and a semantic segmentation image as a query. Leveraging the graph-theoretic framework, the proposed method realizes global localization exploiting only the map and the query. The method shows promising results on multiple large-scale simulated maps of urban scenes.
ROAug 13, 2022
Online Refinement of a Scene Recognition Model for Mobile Robots by Observing Human's Interaction with EnvironmentsShigemichi Matsuzaki, Hiroaki Masuzawa, Jun Miura
This paper describes a method of online refinement of a scene recognition model for robot navigation considering traversable plants, flexible plant parts which a robot can push aside while moving. In scene recognition systems that consider traversable plants growing out to the paths, misclassification may lead the robot to getting stuck due to the traversable plants recognized as obstacles. Yet, misclassification is inevitable in any estimation methods. In this work, we propose a framework that allows for refining a semantic segmentation model on the fly during the robot's operation. We introduce a few-shot segmentation based on weight imprinting for online model refinement without fine-tuning. Training data are collected via observation of a human's interaction with the plant parts. We propose novel robust weight imprinting to mitigate the effect of noise included in the masks generated by the interaction. The proposed method was evaluated through experiments using real-world data and shown to outperform an ordinary weight imprinting and provide competitive results to fine-tuning with model distillation while requiring less computational cost.
CVMar 2, 2023
Multi-Source Soft Pseudo-Label Learning with Domain Similarity-based Weighting for Semantic SegmentationShigemichi Matsuzaki, Hiroaki Masuzawa, Jun Miura
This paper describes a method of domain adaptive training for semantic segmentation using multiple source datasets that are not necessarily relevant to the target dataset. We propose a soft pseudo-label generation method by integrating predicted object probabilities from multiple source models. The prediction of each source model is weighted based on the estimated domain similarity between the source and the target datasets to emphasize contribution of a model trained on a source that is more similar to the target and generate reasonable pseudo-labels. We also propose a training method using the soft pseudo-labels considering their entropy to fully exploit information from the source datasets while suppressing the influence of possibly misclassified pixels. The experiments show comparative or better performance than our previous work and another existing multi-source domain adaptation method, and applicability to a variety of target environments.
CVFeb 8, 2024
CLIP-Loc: Multi-modal Landmark Association for Global Localization in Object-based MapsShigemichi Matsuzaki, Takuma Sugino, Kazuhito Tanaka et al.
This paper describes a multi-modal data association method for global localization using object-based maps and camera images. In global localization, or relocalization, using object-based maps, existing methods typically resort to matching all possible combinations of detected objects and landmarks with the same object category, followed by inlier extraction using RANSAC or brute-force search. This approach becomes infeasible as the number of landmarks increases due to the exponential growth of correspondence candidates. In this paper, we propose labeling landmarks with natural language descriptions and extracting correspondences based on conceptual similarity with image observations using a Vision Language Model (VLM). By leveraging detailed text information, our approach efficiently extracts correspondences compared to methods using only object categories. Through experiments, we demonstrate that the proposed method enables more accurate global localization with fewer iterations compared to baseline methods, exhibiting its efficiency.
ROAug 2, 2021
Image-based scene recognition for robot navigation considering traversable plants and its manual annotation-free trainingShigemichi Matsuzaki, Hiroaki Masuzawa, Jun Miura
This paper describes a method of estimating the traversability of plant parts covering a path and navigating through them for mobile robots operating in plant-rich environments. Conventional mobile robots rely on scene recognition methods that consider only the geometric information of the environment. Those methods, therefore, cannot recognize paths as traversable when they are covered by flexible plants. In this paper, we present a novel framework of image-based scene recognition to realize navigation in such plant-rich environments. Our recognition model exploits a semantic segmentation branch for general object classification and a traversability estimation branch for estimating pixel-wise traversability. The semantic segmentation branch is trained using an unsupervised domain adaptation method and the traversability estimation branch is trained with label images generated from the robot's traversal experience during the data acquisition phase, coined traversability masks. The training procedure of the entire model is, therefore, free from manual annotation. In our experiment, we show that the proposed recognition framework is capable of distinguishing traversable plants more accurately than a conventional semantic segmentation with traversable plant and non-traversable plant classes, and an existing image-based traversability estimation method. We also conducted a real-world experiment and confirmed that the robot with the proposed recognition method successfully navigated in plant-rich environments.
CVFeb 12, 2021
Multi-source Pseudo-label Learning of Semantic Segmentation for the Scene Recognition of Agricultural Mobile RobotsShigemichi Matsuzaki, Jun Miura, Hiroaki Masuzawa
This paper describes a novel method of training a semantic segmentation model for scene recognition of agricultural mobile robots exploiting publicly available datasets of outdoor scenes that are different from the target greenhouse environments. Semantic segmentation models require abundant labels given by tedious manual annotation. A method to work around it is unsupervised domain adaptation (UDA) that transfers knowledge from labeled source datasets to unlabeled target datasets. However, the effectiveness of existing methods is not well studied in adaptation between heterogeneous environments, such as urban scenes and greenhouses. In this paper, we propose a method to train a semantic segmentation model for greenhouse images without manually labeled datasets of greenhouse images. The core of our idea is to use multiple rich image datasets of different environments with segmentation labels to generate pseudo-labels for the target images to effectively transfer the knowledge from multiple sources and realize a precise training of semantic segmentation. Along with the pseudo-label generation, we introduce state-of-the-art methods to deal with noise in the pseudo-labels to further improve the performance. We demonstrate in experiments with multiple greenhouse datasets that our proposed method improves the performance compared to the single-source baselines and an existing approach.