Yoshinobu Hagiwara

RO
h-index30
20papers
277citations
Novelty51%
AI Score54

20 Papers

70.8ROMay 27
Whose Is This?: Context-Aware Object Ownership Inference with Uncertainty-Guided Questioning

Saki Hashimoto, Akira Taniguchi, Shoichi Hasegawa et al.

Service robots must infer object ownership to correctly interpret instructions such as "bring me my cup." However, ownership is a latent attribute that cannot be directly observed, and existing methods often rely on limited cues such as recent usage, making them unreliable in scenarios such as temporary sharing. We propose a framework for context-aware ownership inference with uncertainty-guided interaction (COIN). The method integrates user background information and object usage history using a large language model (LLM) to estimate ownership scores. To handle uncertainty, we apply conformal prediction to construct a set of plausible owners and selectively generate user queries when the prediction is uncertain. Experiments in a simulated home environment show that the proposed method consistently outperforms baseline approaches, achieving a Subset Accuracy of 0.988 and a Mean Jaccard index of 0.991. The method also maintains high performance in scenarios involving temporary use and shared ownership. The results demonstrate that combining contextual reasoning with uncertainty-aware interaction improves both estimation accuracy and robustness. The project page is available at https://emergentsystemlabstudent.github.io/COIN/.

AIMay 24, 2022
Emergent Communication through Metropolis-Hastings Naming Game with Deep Generative Models

Tadahiro Taniguchi, Yuto Yoshida, Akira Taniguchi et al.

Constructive studies on symbol emergence systems seek to investigate computational models that can better explain human language evolution, the creation of symbol systems, and the construction of internal representations. This study provides a new model for emergent communication, which is based on a probabilistic generative model (PGM) instead of a discriminative model based on deep reinforcement learning. We define the Metropolis-Hastings (MH) naming game by generalizing previously proposed models. It is not a referential game with explicit feedback, as assumed by many emergent communication studies. Instead, it is a game based on joint attention without explicit feedback. Mathematically, the MH naming game is proved to be a type of MH algorithm for an integrative PGM that combines two agents that play the naming game. From this viewpoint, symbol emergence is regarded as decentralized Bayesian inference, and semiotic communication is regarded as inter-personal cross-modal inference. This notion leads to the collective predictive coding hypothesis} regarding language evolution and, in general, the emergence of symbols. We also propose the inter-Gaussian mixture model (GMM)+ variational autoencoder (VAE), a deep generative model for emergent communication based on the MH naming game. The model has been validated on MNIST and Fruits 360 datasets. Experimental findings demonstrate that categories are formed from real images observed by agents, and signs are correctly shared across agents by successfully utilizing both of the observations of agents via the MH naming game. Furthermore, scholars verified that visual images were recalled from signs uttered by agents. Notably, emergent communication without supervision and reward feedback improved the performance of the unsupervised representation learning of agents.

RONov 20, 2022
Active Exploration based on Information Gain by Particle Filter for Efficient Spatial Concept Formation

Akira Taniguchi, Yoshiki Tabuchi, Tomochika Ishikawa et al.

Autonomous robots need to learn the categories of various places by exploring their environments and interacting with users. However, preparing training datasets with linguistic instructions from users is time-consuming and labor-intensive. Moreover, effective exploration is essential for appropriate concept formation and rapid environmental coverage. To address this issue, we propose an active inference method, referred to as spatial concept formation with information gain-based active exploration (SpCoAE) that combines sequential Bayesian inference using particle filters and information gain-based destination determination in a probabilistic generative model. This study interprets the robot's action as a selection of destinations to ask the user, `What kind of place is this?' in the context of active inference. This study provides insights into the technical aspects of the proposed method, including active perception and exploration by the robot, and how the method can enable mobile robots to learn spatial concepts through active exploration. Our experiment demonstrated the effectiveness of the SpCoAE in efficiently determining a destination for learning appropriate spatial concepts in home environments.

LGMay 24, 2022
Symbol Emergence as Inter-personal Categorization with Head-to-head Latent Word

Kazuma Furukawa, Akira Taniguchi, Yoshinobu Hagiwara et al.

In this study, we propose a head-to-head type (H2H-type) inter-personal multimodal Dirichlet mixture (Inter-MDM) by modifying the original Inter-MDM, which is a probabilistic generative model that represents the symbol emergence between two agents as multiagent multimodal categorization. A Metropolis--Hastings method-based naming game based on the Inter-MDM enables two agents to collaboratively perform multimodal categorization and share signs with a solid mathematical foundation of convergence. However, the conventional Inter-MDM presumes a tail-to-tail connection across a latent word variable, causing inflexibility of the further extension of Inter-MDM for modeling a more complex symbol emergence. Therefore, we propose herein a head-to-head type (H2H-type) Inter-MDM that treats a latent word variable as a child node of an internal variable of each agent in the same way as many prior studies of multimodal categorization. On the basis of the H2H-type Inter-MDM, we propose a naming game in the same way as the conventional Inter-MDM. The experimental results show that the H2H-type Inter-MDM yields almost the same performance as the conventional Inter-MDM from the viewpoint of multimodal categorization and sign sharing.

CLJun 27, 2023
Symbol emergence as interpersonal cross-situational learning: the emergence of lexical knowledge with combinatoriality

Yoshinobu Hagiwara, Kazuma Furukawa, Takafumi Horie et al.

We present a computational model for a symbol emergence system that enables the emergence of lexical knowledge with combinatoriality among agents through a Metropolis-Hastings naming game and cross-situational learning. Many computational models have been proposed to investigate combinatoriality in emergent communication and symbol emergence in cognitive and developmental robotics. However, existing models do not sufficiently address category formation based on sensory-motor information and semiotic communication through the exchange of word sequences within a single integrated model. Our proposed model facilitates the emergence of lexical knowledge with combinatoriality by performing category formation using multimodal sensory-motor information and enabling semiotic communication through the exchange of word sequences among agents in a unified model. Furthermore, the model enables an agent to predict sensory-motor information for unobserved situations by combining words associated with categories in each modality. We conducted two experiments with two humanoid robots in a simulated environment to evaluate our proposed model. The results demonstrated that the agents can acquire lexical knowledge with combinatoriality through interpersonal cross-situational learning based on the Metropolis-Hastings naming game and cross-situational learning. Furthermore, our results indicate that the lexical knowledge developed using our proposed model exhibits generalization performance for novel situations through interpersonal cross-modal inference.

ROAug 22, 2025
Take That for Me: Multimodal Exophora Resolution with Interactive Questioning for Ambiguous Out-of-View Instructions

Akira Oyama, Shoichi Hasegawa, Akira Taniguchi et al.

Daily life support robots must interpret ambiguous verbal instructions involving demonstratives such as ``Bring me that cup,'' even when objects or users are out of the robot's view. Existing approaches to exophora resolution primarily rely on visual data and thus fail in real-world scenarios where the object or user is not visible. We propose Multimodal Interactive Exophora resolution with user Localization (MIEL), which is a multimodal exophora resolution framework leveraging sound source localization (SSL), semantic mapping, visual-language models (VLMs), and interactive questioning with GPT-4o. Our approach first constructs a semantic map of the environment and estimates candidate objects from a linguistic query with the user's skeletal data. SSL is utilized to orient the robot toward users who are initially outside its visual field, enabling accurate identification of user gestures and pointing directions. When ambiguities remain, the robot proactively interacts with the user, employing GPT-4o to formulate clarifying questions. Experiments in a real-world environment showed results that were approximately 1.3 times better when the user was visible to the robot and 2.0 times better when the user was not visible to the robot, compared to the methods without SSL and interactive questioning. The project website is https://emergentsystemlabstudent.github.io/MIEL/.

ROSep 16, 2025
Toward Ownership Understanding of Objects: Active Question Generation with Large Language Model and Probabilistic Generative Model

Saki Hashimoto, Shoichi Hasegawa, Tomochika Ishikawa et al.

Robots operating in domestic and office environments must understand object ownership to correctly execute instructions such as ``Bring me my cup.'' However, ownership cannot be reliably inferred from visual features alone. To address this gap, we propose Active Ownership Learning (ActOwL), a framework that enables robots to actively generate and ask ownership-related questions to users. ActOwL employs a probabilistic generative model to select questions that maximize information gain, thereby acquiring ownership knowledge efficiently to improve learning efficiency. Additionally, by leveraging commonsense knowledge from Large Language Models (LLM), objects are pre-classified as either shared or owned, and only owned objects are targeted for questioning. Through experiments in a simulated home environment and a real-world laboratory setting, ActOwL achieved significantly higher ownership clustering accuracy with fewer questions than baseline methods. These findings demonstrate the effectiveness of combining active inference with LLM-guided commonsense reasoning, advancing the capability of robots to acquire ownership knowledge for practical and socially appropriate task execution.

ROSep 16, 2025
Multi-Robot Task Planning for Multi-Object Retrieval Tasks with Distributed On-Site Knowledge via Large Language Models

Kento Murata, Shoichi Hasegawa, Tomochika Ishikawa et al.

It is crucial to efficiently execute instructions such as "Find an apple and a banana" or "Get ready for a field trip," which require searching for multiple objects or understanding context-dependent commands. This study addresses the challenging problem of determining which robot should be assigned to which part of a task when each robot possesses different situational on-site knowledge-specifically, spatial concepts learned from the area designated to it by the user. We propose a task planning framework that leverages large language models (LLMs) and spatial concepts to decompose natural language instructions into subtasks and allocate them to multiple robots. We designed a novel few-shot prompting strategy that enables LLMs to infer required objects from ambiguous commands and decompose them into appropriate subtasks. In our experiments, the proposed method achieved 47/50 successful assignments, outperforming random (28/50) and commonsense-based assignment (26/50). Furthermore, we conducted qualitative evaluations using two actual mobile manipulators. The results demonstrated that our framework could handle instructions, including those involving ad hoc categories such as "Get ready for a field trip," by successfully performing task decomposition, assignment, sequential planning, and execution.

HCJun 18, 2025
Co-Creative Learning via Metropolis-Hastings Interaction between Humans and AI

Ryota Okumura, Tadahiro Taniguchi, Akira Taniguchi et al.

We propose co-creative learning as a novel paradigm where humans and AI, i.e., biological and artificial agents, mutually integrate their partial perceptual information and knowledge to construct shared external representations, a process we interpret as symbol emergence. Unlike traditional AI teaching based on unilateral knowledge transfer, this addresses the challenge of integrating information from inherently different modalities. We empirically test this framework using a human-AI interaction model based on the Metropolis-Hastings naming game (MHNG), a decentralized Bayesian inference mechanism. In an online experiment, 69 participants played a joint attention naming game (JA-NG) with one of three computer agent types (MH-based, always-accept, or always-reject) under partial observability. Results show that human-AI pairs with an MH-based agent significantly improved categorization accuracy through interaction and achieved stronger convergence toward a shared sign system. Furthermore, human acceptance behavior aligned closely with the MH-derived acceptance probability. These findings provide the first empirical evidence for co-creative learning emerging in human-AI dyads via MHNG-based interaction. This suggests a promising path toward symbiotic AI systems that learn with humans, rather than from them, by dynamically aligning perceptual experiences, opening a new venue for symbiotic AI alignment.

ROApr 15, 2024
Real-world Instance-specific Image Goal Navigation: Bridging Domain Gaps via Contrastive Learning

Taichi Sakaguchi, Akira Taniguchi, Yoshinobu Hagiwara et al.

Improving instance-specific image goal navigation (InstanceImageNav), which locates the identical object in a real-world environment from a query image, is essential for robotic systems to assist users in finding desired objects. The challenge lies in the domain gap between low-quality images observed by the moving robot, characterized by motion blur and low-resolution, and high-quality query images provided by the user. Such domain gaps could significantly reduce the task success rate but have not been the focus of previous work. To address this, we propose a novel method called Few-shot Cross-quality Instance-aware Adaptation (CrossIA), which employs contrastive learning with an instance classifier to align features between massive low- and few high-quality images. This approach effectively reduces the domain gap by bringing the latent representations of cross-quality images closer on an instance basis. Additionally, the system integrates an object image collection with a pre-trained deblurring model to enhance the observed image quality. Our method fine-tunes the SimSiam model, pre-trained on ImageNet, using CrossIA. We evaluated our method's effectiveness through an InstanceImageNav task with 20 different types of instances, where the robot identifies the same instance in a real-world environment as a high-quality query image. Our experiments showed that our method improves the task success rate by up to three times compared to the baseline, a conventional approach based on SuperGlue. These findings highlight the potential of leveraging contrastive learning and image enhancement techniques to bridge the domain gap and improve object localization in robotic applications. The project website is https://emergentsystemlabstudent.github.io/DomainBridgingNav/.

CLMay 31, 2023
Recursive Metropolis-Hastings Naming Game: Symbol Emergence in a Multi-agent System based on Probabilistic Generative Models

Jun Inukai, Tadahiro Taniguchi, Akira Taniguchi et al.

In the studies on symbol emergence and emergent communication in a population of agents, a computational model was employed in which agents participate in various language games. Among these, the Metropolis-Hastings naming game (MHNG) possesses a notable mathematical property: symbol emergence through MHNG is proven to be a decentralized Bayesian inference of representations shared by the agents. However, the previously proposed MHNG is limited to a two-agent scenario. This paper extends MHNG to an N-agent scenario. The main contributions of this paper are twofold: (1) we propose the recursive Metropolis-Hastings naming game (RMHNG) as an N-agent version of MHNG and demonstrate that RMHNG is an approximate Bayesian inference method for the posterior distribution over a latent variable shared by agents, similar to MHNG; and (2) we empirically evaluate the performance of RMHNG on synthetic and real image data, enabling multiple agents to develop and share a symbol system. Furthermore, we introduce two types of approximations -- one-sample and limited-length -- to reduce computational complexity while maintaining the ability to explain communication in a population of agents. The experimental findings showcased the efficacy of RMHNG as a decentralized Bayesian inference for approximating the posterior distribution concerning latent variables, which are jointly shared among agents, akin to MHNG. Moreover, the utilization of RMHNG elucidated the agents' capacity to exchange symbols. Furthermore, the study discovered that even the computationally simplified version of RMHNG could enable symbols to emerge among the agents.

AISep 15, 2021
Multiagent Multimodal Categorization for Symbol Emergence: Emergent Communication via Interpersonal Cross-modal Inference

Yoshinobu Hagiwara, Kazuma Furukawa, Akira Taniguchi et al.

This paper describes a computational model of multiagent multimodal categorization that realizes emergent communication. We clarify whether the computational model can reproduce the following functions in a symbol emergence system, comprising two agents with different sensory modalities playing a naming game. (1) Function for forming a shared lexical system that comprises perceptual categories and corresponding signs, formed by agents through individual learning and semiotic communication between agents. (2) Function to improve the categorization accuracy in an agent via semiotic communication with another agent, even when some sensory modalities of each agent are missing. (3) Function that an agent infers unobserved sensory information based on a sign sampled from another agent in the same manner as cross-modal inference. We propose an interpersonal multimodal Dirichlet mixture (Inter-MDM), which is derived by dividing an integrative probabilistic generative model, which is obtained by integrating two Dirichlet mixtures (DMs). The Markov chain Monte Carlo algorithm realizes emergent communication. The experimental results demonstrated that Inter-MDM enables agents to form multimodal categories and appropriately share signs between agents. It is shown that emergent communication improves categorization accuracy, even when some sensory modalities are missing. Inter-MDM enables an agent to predict unobserved information based on a shared sign.

ROMar 16, 2021
Map completion from partial observation using the global structure of multiple environmental maps

Yuki Katsumata, Akinori Kanechika, Akira Taniguchi et al.

Using the spatial structure of various indoor environments as prior knowledge, the robot would construct the map more efficiently. Autonomous mobile robots generally apply simultaneous localization and mapping (SLAM) methods to understand the reachable area in newly visited environments. However, conventional mapping approaches are limited by only considering sensor observation and control signals to estimate the current environment map. This paper proposes a novel SLAM method, map completion network-based SLAM (MCN-SLAM), based on a probabilistic generative model incorporating deep neural networks for map completion. These map completion networks are primarily trained in the framework of generative adversarial networks (GANs) to extract the global structure of large amounts of existing map data. We show in experiments that the proposed method can estimate the environment map 1.3 times better than the previous SLAM methods in the situation of partial observation.

ROMar 11, 2021
Hierarchical Bayesian Model for the Transfer of Knowledge on Spatial Concepts based on Multimodal Information

Yoshinobu Hagiwara, Keishiro Taguchi, Satoshi Ishibushi et al.

This paper proposes a hierarchical Bayesian model based on spatial concepts that enables a robot to transfer the knowledge of places from experienced environments to a new environment. The transfer of knowledge based on spatial concepts is modeled as the calculation process of the posterior distribution based on the observations obtained in each environment with the parameters of spatial concepts generalized to environments as prior knowledge. We conducted experiments to evaluate the generalization performance of spatial knowledge for general places such as kitchens and the adaptive performance of spatial knowledge for unique places such as `Emma's room' in a new environment. In the experiments, the accuracies of the proposed method and conventional methods were compared in the prediction task of location names from an image and a position, and the prediction task of positions from a location name. The experimental results demonstrated that the proposed method has a higher prediction accuracy of location names and positions than the conventional method owing to the transfer of knowledge.

ROFeb 18, 2020
Spatial Concept-Based Navigation with Human Speech Instructions via Probabilistic Inference on Bayesian Generative Model

Akira Taniguchi, Yoshinobu Hagiwara, Tadahiro Taniguchi et al.

Robots are required to not only learn spatial concepts autonomously but also utilize such knowledge for various tasks in a domestic environment. Spatial concept represents a multimodal place category acquired from the robot's spatial experience including vision, speech-language, and self-position. The aim of this study is to enable a mobile robot to perform navigational tasks with human speech instructions, such as `Go to the kitchen', via probabilistic inference on a Bayesian generative model using spatial concepts. Specifically, path planning was formalized as the maximization of probabilistic distribution on the path-trajectory under speech instruction, based on a control-as-inference framework. Furthermore, we described the relationship between probabilistic inference based on the Bayesian generative model and control problem including reinforcement learning. We demonstrated path planning based on human instruction using acquired spatial concepts to verify the usefulness of the proposed approach in the simulator and in real environments. Experimentally, places instructed by the user's speech commands showed high probability values, and the trajectory toward the target place was correctly estimated. Our approach, based on probabilistic inference concerning decision-making, can lead to further improvement in robot autonomy.

ROFeb 10, 2020
Autonomous Planning Based on Spatial Concepts to Tidy Up Home Environments with Service Robots

Akira Taniguchi, Shota Isobe, Lotfi El Hafi et al.

Tidy-up tasks by service robots in home environments are challenging in robotics applications because they involve various interactions with the environment. In particular, robots are required not only to grasp, move, and release various home objects but also to plan the order and positions for placing the objects. In this paper, we propose a novel planning method that can efficiently estimate the order and positions of the objects to be tidied up by learning the parameters of a probabilistic generative model. The model allows a robot to learn the distributions of the co-occurrence probability of the objects and places to tidy up using the multimodal sensor information collected in a tidied environment. Additionally, we develop an autonomous robotic system to perform the tidy-up operation. We evaluate the effectiveness of the proposed method by an experimental simulation that reproduces the conditions of the Tidy Up Here task of the World Robot Summit 2018 international robotics competition. The simulation results show that the proposed method enables the robot to successively tidy up several objects and achieves the best task score among the considered baseline tidy-up methods.

CLMay 31, 2019
Symbol Emergence as an Interpersonal Multimodal Categorization

Yoshinobu Hagiwara, Hiroyoshi Kobayashi, Akira Taniguchi et al.

This study focuses on category formation for individual agents and the dynamics of symbol emergence in a multi-agent system through semiotic communication. Semiotic communication is defined, in this study, as the generation and interpretation of signs associated with the categories formed through the agent's own sensory experience or by exchange of signs with other agents. From the viewpoint of language evolution and symbol emergence, organization of a symbol system in a multi-agent system is considered as a bottom-up and dynamic process, where individual agents share the meaning of signs and categorize sensory experience. A constructive computational model can explain the mutual dependency of the two processes and has mathematical support that guarantees a symbol system's emergence and sharing within the multi-agent system. In this paper, we describe a new computational model that represents symbol emergence in a two-agent system based on a probabilistic generative model for multimodal categorization. It models semiotic communication via a probabilistic rejection based on the receiver's own belief. We have found that the dynamics by which cognitively independent agents create a symbol system through their semiotic communication can be regarded as the inference process of a hidden variable in an interpersonal multimodal categorizer, if we define the rejection probability based on the Metropolis-Hastings algorithm. The validity of the proposed model and algorithm for symbol emergence is also verified in an experiment with two agents observing daily objects in the real-world environment. The experimental results demonstrate that our model reproduces the phenomena of symbol emergence, which does not require a teacher who would know a pre-existing symbol system. Instead, the multi-agent system can form and use a symbol system without having pre-existing categories.

ROMar 9, 2018
Improved and Scalable Online Learning of Spatial Concepts and Language Models with Mapping

Akira Taniguchi, Yoshinobu Hagiwara, Tadahiro Taniguchi et al.

We propose a novel online learning algorithm, called SpCoSLAM 2.0, for spatial concepts and lexical acquisition with high accuracy and scalability. Previously, we proposed SpCoSLAM as an online learning algorithm based on unsupervised Bayesian probabilistic model that integrates multimodal place categorization, lexical acquisition, and SLAM. However, our original algorithm had limited estimation accuracy owing to the influence of the early stages of learning, and increased computational complexity with added training data. Therefore, we introduce techniques such as fixed-lag rejuvenation to reduce the calculation time while maintaining an accuracy higher than that of the original algorithm. The results show that, in terms of estimation accuracy, the proposed algorithm exceeds the original algorithm and is comparable to batch learning. In addition, the calculation time of the proposed algorithm does not depend on the amount of training data and becomes constant for each step of the scalable algorithm. Our approach will contribute to the realization of long-term spatial language interactions between humans and robots.

AIApr 15, 2017
Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping

Akira Taniguchi, Yoshinobu Hagiwara, Tadahiro Taniguchi et al.

In this paper, we propose an online learning algorithm based on a Rao-Blackwellized particle filter for spatial concept acquisition and mapping. We have proposed a nonparametric Bayesian spatial concept acquisition model (SpCoA). We propose a novel method (SpCoSLAM) integrating SpCoA and FastSLAM in the theoretical framework of the Bayesian generative model. The proposed method can simultaneously learn place categories and lexicons while incrementally generating an environmental map. Furthermore, the proposed method has scene image features and a language model added to SpCoA. In the experiments, we tested online learning of spatial concepts and environmental maps in a novel environment of which the robot did not have a map. Then, we evaluated the results of online learning of spatial concepts and lexical acquisition. The experimental results demonstrated that the robot was able to more accurately learn the relationships between words and the place in the environmental map incrementally by using the proposed method.

RODec 1, 2016
Bayesian Body Schema Estimation using Tactile Information obtained through Coordinated Random Movements

Tomohiro Mimura, Yoshinobu Hagiwara, Tadahiro Taniguchi et al.

This paper describes a computational model, called the Dirichlet process Gaussian mixture model with latent joints (DPGMM-LJ), that can find latent tree structure embedded in data distribution in an unsupervised manner. By combining DPGMM-LJ and a pre-existing body map formation method, we propose a method that enables an agent having multi-link body structure to discover its kinematic structure, i.e., body schema, from tactile information alone. The DPGMM-LJ is a probabilistic model based on Bayesian nonparametrics and an extension of Dirichlet process Gaussian mixture model (DPGMM). In a simulation experiment, we used a simple fetus model that had five body parts and performed structured random movements in a womb-like environment. It was shown that the method could estimate the number of body parts and kinematic structures without any pre-existing knowledge in many cases. Another experiment showed that the degree of motor coordination in random movements affects the result of body schema formation strongly. It is confirmed that the accuracy rate for body schema estimation had the highest value 84.6% when the ratio of motor coordination was 0.9 in our setting. These results suggest that kinematic structure can be estimated from tactile information obtained by a fetus moving randomly in a womb without any visual information even though its accuracy was not so high. They also suggest that a certain degree of motor coordination in random movements and the sufficient dimension of state space that represents the body map are important to estimate body schema correctly.