Takato Horii

MA
h-index16
15papers
104citations
Novelty41%
AI Score50

15 Papers

CLSep 14, 2024
Constructive Approach to Bidirectional Influence between Qualia Structure and Language Emergence

Tadahiro Taniguchi, Masafumi Oizumi, Noburo Saji et al.

This perspective paper explores the bidirectional influence between language emergence and the relational structure of subjective experiences, termed qualia structure, and lays out a constructive approach to the intricate dependency between the two. We hypothesize that the emergence of languages with distributional semantics (e.g., syntactic-semantic structures) is linked to the coordination of internal representations shaped by experience, potentially facilitating more structured language through reciprocal influence. This hypothesized mutual dependency connects to recent advancements in AI and symbol emergence robotics, and is explored within this paper through theoretical frameworks such as the collective predictive coding. Computational studies show that neural network-based language models form systematically structured internal representations, and multimodal language models can share representations between language and perceptual information. This perspective suggests that language emergence serves not only as a mechanism creating a communication tool but also as a mechanism for allowing people to realize shared understanding of qualitative experiences. The paper discusses the implications of this bidirectional influence in the context of consciousness studies, linguistics, and cognitive science, and outlines future constructive research directions to further explore this dynamic relationship between language emergence and qualia structure.

ROMay 16
SADP: Subgoal-Aware Diffusion Policy for Explainable Robots Learned from Foundation Model Generated Demonstrations

Site Hu, Takato Horii

Explainable robots require not only successful task execution but also the ability to expose internal decision-making process in a user-friendly manner. However, most imitation learning methods are trained solely on task-level demonstrations, without explicitly modeling subgoal structure or execution progress. This limitation is further exacerbated by the scarcity of subgoal-level supervision in standard robot learning datasets, which restricts the development of robots that can convey the subtasks they are executing during long-horizon manipulation. To address this issue, this paper proposes Subgoal-Aware Diffusion Policy (SADP), a framework that leverages foundation models to autonomously generate subgoal-annotated demonstrations and trains diffusion policies on these datasets. SADP structures policy execution around human-interpretable subgoals by conditioning action generation on both task-level and subgoal-level descriptions. A lightweight auxiliary head further predicts subgoal completion states, allowing the robot to expose its current execution stage and monitor subgoal progression. Experiments in RLBench simulations and real-world evaluations on a UR5e robot demonstrate that SADP achieves higher task success rates than strong task-conditioned diffusion baselines, while providing subgoal-level execution signals for monitoring progress and diagnosing failures. These results highlight that built-in, rather than post-hoc, interpretability can coexist with high task performance.

MAApr 10
Social Reality Construction via Active Inference: Modeling the Dialectic of Conformity and Creativity

Kentaro Nomura, Takato Horii

Social agents both internalize collective norms and reshape them through creative action, yet computational models have not captured this bidirectional process within a unified framework. We propose a multi-agent simulation model grounded in active inference that formalizes the dialectical constitution of social reality on a structured social network. Each agent maintains an internal generative model, communicates with neighbors to form social priors, creates novel observations, and selectively incorporates others' creations into memory. Simulation experiments demonstrate three main findings. First, informationally cohesive social groups emerge endogenously, with representational alignment mirroring the cluster topology of the underlying network. Second, a circular mutual constitution arises between social representations and the observation distribution, maintained through agents' creative acts that project representational structure onto the external world. Third, the propagation of creations exhibits selective, heterogeneous patterns distinct from the stable diffusion of social representations, indicating that agents construct cultural niches through local interaction dynamics. These results suggest that the interplay between social conformity and creative deviation can give rise to the endogenous formation and differentiation of shared social reality.

MAMay 10
Emergent Communication for Co-constructed Emotion Between Embodied Agents via Collective Predictive Coding

Zehang Zhang, Nguyen Le Hoang, Tadahiro Taniguchi et al.

According to the theory of constructed emotion, the brain actively forms emotion categories by integrating multimodal bodily signals, and constructs emotional experiences by using these categories to predict and interpret sensory inputs. While research has advanced in modeling individual emotion construction, the social process of co-construction-how a shared understanding of emotions emerges between individuals-remains computationally underexplored. This study investigates this process by modeling emergent communication between two embodied agents using the Metropolis-Hastings Naming Game (MHNG), grounded in the Collective Predictive Coding (CPC) framework. Our experiments, using visual, auditory, and simulated interoceptive inputs, yield two main findings. First, MHNG-based communication significantly improves the alignment, clarity, and inter-agent agreement of the learned emotion categories compared to non-communicative and non-selective baselines, with the alignment effect concentrated at the symbolic layer rather than the perceptual latent representation. Second, even when the two agents have systematically divergent interoceptive dynamics, communication still produces robust categorical alignment, with distinct, category-specific reshaping patterns of each agent's emotion categories-consistent with the constructed-emotion view that interoceptive heterogeneity is constitutive of, rather than an obstacle to, shared emotional meaning. These findings provide computational support for the co-constructionist view of emotion and extend the CPC framework from physical to socially-grounded domains.

MAMay 8
Emergence of Social Reality of Emotion through a Social Allostasis Model with Dynamic Interpretants

Kentaro Nomura, Yushi Tsubamoto, Takato Horii

The theory of constructed emotion defines social reality as the community-level consensus on emotion concepts assigned to interoceptive sensations arising from bodily allostasis and social interaction. In this study, we simulate this emergence process using a computational model that integrates symbol emergence with degrees of freedom in symbol interpretation and active inference. Two agents receive interoceptive signals, exchange inferred symbols, and simultaneously adapt their bodily control goals and symbol interpretations to each other. Experimental results show that the interoceptive prior preferences and symbol probability distributions of the two agents converge, confirming the emergence of social reality grounded in social consensus.

MAMay 8
Synchronizing Minds through Collective Predictive Coding: A Computational Model of Parent-Infant Homeostatic Co-Regulation

Yushi Tsubamoto, Takato Horii

Inter-brain synchrony (IBS) observed in real-time dyadic interactions, including parent--infant exchanges, suggests that two agents come to share aligned latent representations through interaction. Yet computational accounts of how such alignment can arise between agents that have only local sensory access and asymmetric internal knowledge remain underdeveloped. We propose a constructive model of parent--infant homeostatic co-regulation that integrates a POMDP formulation of active interoceptive inference with the Metropolis--Hastings Naming Game (MHNG) derived from the Collective Predictive Coding (CPC) hypothesis. In our model, the parent observes the infant only through an exteroceptive signal while the infant directly senses its own interoceptive state; the two agents agree on regulatory actions through a shared communicative variable whose acceptance is determined by a locally computable Metropolis--Hastings probability. The agents are further endowed with asymmetric generative-model knowledge: the parent knows how actions transform visceral states but must learn what the infant's body is communicating, whereas the infant perceives its visceral state directly but must learn how actions affect it. In a $6 \times 6$ visceral-state grid world, MHNG-mediated interaction regulated the infant's visceral state more adaptively than one-sided control conditions, and the two posteriors became rapidly aligned. Notably, this latent-state alignment emerged far earlier than the convergence of the learned generative matrices, indicating that representational synchrony does not presuppose fully shared world models. These results offer a minimal constructive account of latent-state alignment compatible with IBS reported in hyperscanning studies and support CPC as a candidate computational basis for inter-brain alignment.

MAApr 4, 2025
Decentralized Collective World Model for Emergent Communication and Coordination

Kentaro Nomura, Tatsuya Aoki, Tadahiro Taniguchi et al.

We propose a fully decentralized multi-agent world model that enables both symbol emergence for communication and coordinated behavior through temporal extension of collective predictive coding. Unlike previous research that focuses on either communication or coordination separately, our approach achieves both simultaneously. Our method integrates world models with communication channels, enabling agents to predict environmental dynamics, estimate states from partial observations, and share critical information through bidirectional message exchange with contrastive learning for message alignment. Using a two-agent trajectory drawing task, we demonstrate that our communication-based approach outperforms non-communicative models when agents have divergent perceptual capabilities, achieving the second-best coordination after centralized models. Importantly, our decentralized approach with constraints preventing direct access to other agents' internal states facilitates the emergence of more meaningful symbol systems that accurately reflect environmental states. These findings demonstrate the effectiveness of decentralized communication for supporting coordination while developing shared representations of the environment.

AIMar 8, 2025
System 0/1/2/3: Quad-process theory for multi-timescale embodied collective cognitive systems

Tadahiro Taniguchi, Yasushi Hirai, Masahiro Suzuki et al.

This paper introduces the System 0/1/2/3 framework as an extension of dual-process theory, employing a quad-process model of cognition. Expanding upon System 1 (fast, intuitive thinking) and System 2 (slow, deliberative thinking), we incorporate System 0, which represents pre-cognitive embodied processes, and System 3, which encompasses collective intelligence and symbol emergence. We contextualize this model within Bergson's philosophy by adopting multi-scale time theory to unify the diverse temporal dynamics of cognition. System 0 emphasizes morphological computation and passive dynamics, illustrating how physical embodiment enables adaptive behavior without explicit neural processing. Systems 1 and 2 are explained from a constructive perspective, incorporating neurodynamical and AI viewpoints. In System 3, we introduce collective predictive coding to explain how societal-level adaptation and symbol emergence operate over extended timescales. This comprehensive framework ranges from rapid embodied reactions to slow-evolving collective intelligence, offering a unified perspective on cognition across multiple timescales, levels of abstraction, and forms of human intelligence. The System 0/1/2/3 model provides a novel theoretical foundation for understanding the interplay between adaptive and cognitive processes, thereby opening new avenues for research in cognitive science, AI, robotics, and collective intelligence.

AIMay 19, 2025
Correspondence of high-dimensional emotion structures elicited by video clips between humans and Multimodal LLMs

Haruka Asanuma, Naoko Koide-Majima, Ken Nakamura et al.

Recent studies have revealed that human emotions exhibit a high-dimensional, complex structure. A full capturing of this complexity requires new approaches, as conventional models that disregard high dimensionality risk overlooking key nuances of human emotions. Here, we examined the extent to which the latest generation of rapidly evolving Multimodal Large Language Models (MLLMs) capture these high-dimensional, intricate emotion structures, including capabilities and limitations. Specifically, we compared self-reported emotion ratings from participants watching videos with model-generated estimates (e.g., Gemini or GPT). We evaluated performance not only at the individual video level but also from emotion structures that account for inter-video relationships. At the level of simple correlation between emotion structures, our results demonstrated strong similarity between human and model-inferred emotion structures. To further explore whether the similarity between humans and models is at the signle item level or the coarse-categorical level, we applied Gromov Wasserstein Optimal Transport. We found that although performance was not necessarily high at the strict, single-item level, performance across video categories that elicit similar emotions was substantial, indicating that the model could infer human emotional experiences at the category level. Our results suggest that current state-of-the-art MLLMs broadly capture the complex high-dimensional emotion structures at the category level, as well as their apparent limitations in accurately capturing entire structures at the single-item level.

MANov 26, 2024
Creative Agents: Simulating the Systems Model of Creativity with Generative Agents

Naomi Imasato, Kazuki Miyazawa, Takayuki Nagai et al.

With the growing popularity of generative AI for images, video, and music, we witnessed models rapidly improve in quality and performance. However, not much attention is paid towards enabling AI's ability to "be creative". In this study, we implemented and simulated the systems model of creativity (proposed by Csikszentmihalyi) using virtual agents utilizing large language models (LLMs) and text prompts. For comparison, the simulations were conducted with the "virtual artists" being: 1)isolated and 2)placed in a multi-agent system. Both scenarios were compared by analyzing the variations and overall "creativity" in the generated artifacts (measured via a user study and LLM). Our results suggest that the generative agents may perform better in the framework of the systems model of creativity.

AIMay 6, 2021
A Framework of Explanation Generation toward Reliable Autonomous Robots

Tatsuya Sakai, Kazuki Miyazawa, Takato Horii et al.

To realize autonomous collaborative robots, it is important to increase the trust that users have in them. Toward this goal, this paper proposes an algorithm which endows an autonomous agent with the ability to explain the transition from the current state to the target state in a Markov decision process (MDP). According to cognitive science, to generate an explanation that is acceptable to humans, it is important to present the minimum information necessary to sufficiently understand an event. To meet this requirement, this study proposes a framework for identifying important elements in the decision-making process using a prediction model for the world and generating explanations based on these elements. To verify the ability of the proposed method to generate explanations, we conducted an experiment using a grid environment. It was inferred from the result of a simulation experiment that the explanation generated using the proposed method was composed of the minimum elements important for understanding the transition from the current state to the target state. Furthermore, subject experiments showed that the generated explanation was a good summary of the process of state transition, and that a high evaluation was obtained for the explanation of the reason for an action.

LGApr 15, 2020
lamBERT: Language and Action Learning Using Multimodal BERT

Kazuki Miyazawa, Tatsuya Aoki, Takato Horii et al.

Recently, the bidirectional encoder representations from transformers (BERT) model has attracted much attention in the field of natural language processing, owing to its high performance in language understanding-related tasks. The BERT model learns language representation that can be adapted to various tasks via pre-training using a large corpus in an unsupervised manner. This study proposes the language and action learning using multimodal BERT (lamBERT) model that enables the learning of language and actions by 1) extending the BERT model to multimodal representation and 2) integrating it with reinforcement learning. To verify the proposed model, an experiment is conducted in a grid environment that requires language understanding for the agent to act properly. As a result, the lamBERT model obtained higher rewards in multitask settings and transfer settings when compared to other models, such as the convolutional neural network-based model and the lamBERT model without pre-training.

LGNov 1, 2019
Situated GAIL: Multitask imitation using task-conditioned adversarial inverse reinforcement learning

Kyoichiro Kobayashi, Takato Horii, Ryo Iwaki et al.

Generative adversarial imitation learning (GAIL) has attracted increasing attention in the field of robot learning. It enables robots to learn a policy to achieve a task demonstrated by an expert while simultaneously estimating the reward function behind the expert's behaviors. However, this framework is limited to learning a single task with a single reward function. This study proposes an extended framework called situated GAIL (S-GAIL), in which a task variable is introduced to both the discriminator and generator of the GAIL framework. The task variable has the roles of discriminating different contexts and making the framework learn different reward functions and policies for multiple tasks. To achieve the early convergence of learning and robustness during reward estimation, we introduce a term to adjust the entropy regularization coefficient in the generator's objective function. Our experiments using two setups (navigation in a discrete grid world and arm reaching in a continuous space) demonstrate that the proposed framework can acquire multiple reward functions and policies more effectively than existing frameworks. The task variable enables our framework to differentiate contexts while sharing common knowledge among multiple tasks.

LGOct 20, 2019
Neuro-SERKET: Development of Integrative Cognitive System through the Composition of Deep Probabilistic Generative Models

Tadahiro Taniguchi, Tomoaki Nakamura, Masahiro Suzuki et al.

This paper describes a framework for the development of an integrative cognitive system based on probabilistic generative models (PGMs) called Neuro-SERKET. Neuro-SERKET is an extension of SERKET, which can compose elemental PGMs developed in a distributed manner and provide a scheme that allows the composed PGMs to learn throughout the system in an unsupervised way. In addition to the head-to-tail connection supported by SERKET, Neuro-SERKET supports tail-to-tail and head-to-head connections, as well as neural network-based modules, i.e., deep generative models. As an example of a Neuro-SERKET application, an integrative model was developed by composing a variational autoencoder (VAE), a Gaussian mixture model (GMM), latent Dirichlet allocation (LDA), and automatic speech recognition (ASR). The model is called VAE+GMM+LDA+ASR. The performance of VAE+GMM+LDA+ASR and the validity of Neuro-SERKET were demonstrated through a multimodal categorization task using image data and a speech signal of numerical digits.

AIAug 25, 2018
Deep Emotion: A Computational Model of Emotion Using Deep Neural Networks

Chie Hieida, Takato Horii, Takayuki Nagai

Emotions are very important for human intelligence. For example, emotions are closely related to the appraisal of the internal bodily state and external stimuli. This helps us to respond quickly to the environment. Another important perspective in human intelligence is the role of emotions in decision-making. Moreover, the social aspect of emotions is also very important. Therefore, if the mechanism of emotions were elucidated, we could advance toward the essential understanding of our natural intelligence. In this study, a model of emotions is proposed to elucidate the mechanism of emotions through the computational model. Furthermore, from the viewpoint of partner robots, the model of emotions may help us to build robots that can have empathy for humans. To understand and sympathize with people's feelings, the robots need to have their own emotions. This may allow robots to be accepted in human society. The proposed model is implemented using deep neural networks consisting of three modules, which interact with each other. Simulation results reveal that the proposed model exhibits reasonable behavior as the basic mechanism of emotion.