Yuji Kanagawa

LG
h-index4
4papers
21citations
Novelty46%
AI Score32

4 Papers

PEJul 14, 2025
Evolution of Fear and Social Rewards in Prey-Predator Relationship

Yuji Kanagawa, Kenji Doya

Fear is a critical brain function for detecting danger and learning to avoid specific stimuli that can lead to danger. While fear is believed to have evolved under pressure from predators, experimentally reproducing the evolution is challenging. To investigate the relationship between environmental conditions, the evolution of fear, and the evolution of other rewards, such as food reward and social reward, we developed a distributed evolutionary simulation. In our simulation, prey and predator agents co-evolve their innate reward functions, including a possibly fear-like term for observing predators, and learn behaviors via reinforcement learning. Surprisingly, our simulation revealed that social reward for observing the same species is more important for prey to survive, and fear-like negative reward for observing predators evolves only after acquiring social reward. We also found that the predator with increased hunting ability (larger mouth) amplified fear emergence, but also that fear evolution is more stable with non-evolving predators that are bad at chasing prey. Additionally, unlike for predators, we found that positive rewards evolve in opposition to fear for stationary threats, as areas with abundant leftover food develop around them. These findings suggest that fear and social reward have had a complex interplay with each other through evolution, along with the nature of predators and threats.

NEJun 21, 2024
Evolution of Rewards for Food and Motor Action by Simulating Birth and Death

Yuji Kanagawa, Kenji Doya

The reward system is one of the fundamental drivers of animal behaviors and is critical for survival and reproduction. Despite its importance, the problem of how the reward system has evolved is underexplored. In this paper, we try to replicate the evolution of biologically plausible reward functions and investigate how environmental conditions affect evolved rewards' shape. For this purpose, we developed a population-based decentralized evolutionary simulation framework, where agents maintain their energy level to live longer and produce more children. Each agent inherits its reward function from its parent subject to mutation and learns to get rewards via reinforcement learning throughout its lifetime. Our results show that biologically reasonable positive rewards for food acquisition and negative rewards for motor action can evolve from randomly initialized ones. However, we also find that the rewards for motor action diverge into two modes: largely positive and slightly negative. The emergence of positive motor action rewards is surprising because it can make agents too active and inefficient in foraging. In environments with poor and poisonous foods, the evolution of rewards for less important foods tends to be unstable, while rewards for normal foods are still stable. These results demonstrate the usefulness of our simulation environment and energy-dependent birth and death model for further studies of the origin of reward systems.

LGOct 6, 2020
Learning Diverse Options via InfoMax Termination Critic

Yuji Kanagawa, Tomoyuki Kaneko

We consider the problem of autonomously learning reusable temporally extended actions, or options, in reinforcement learning. While options can speed up transfer learning by serving as reusable building blocks, learning reusable options for unknown task distribution remains challenging. Motivated by the recent success of mutual information (MI) based skill learning, we hypothesize that more diverse options are more reusable. To this end, we propose a method for learning termination conditions of options by maximizing MI between options and corresponding state transitions. We derive a scalable approximation of this MI maximization via gradient ascent, yielding the InfoMax Termination Critic (IMTC) algorithm. Our experiments demonstrate that IMTC significantly improves the diversity of learned options without extrinsic rewards combined with an intrinsic option learning method. Moreover, we test the reusability of learned options by transferring options into various tasks, confirming that IMTC helps quick adaptation, especially in complex domains where an agent needs to manipulate objects.

LGApr 17, 2019
Rogue-Gym: A New Challenge for Generalization in Reinforcement Learning

Yuji Kanagawa, Tomoyuki Kaneko

In this paper, we propose Rogue-Gym, a simple and classic style roguelike game built for evaluating generalization in reinforcement learning (RL). Combined with the recent progress of deep neural networks, RL has successfully trained human-level agents without human knowledge in many games such as those for Atari 2600. However, it has been pointed out that agents trained with RL methods often overfit the training environment, and they work poorly in slightly different environments. To investigate this problem, some research environments with procedural content generation have been proposed. Following these studies, we propose the use of roguelikes as a benchmark for evaluating the generalization ability of RL agents. In our Rogue-Gym, agents need to explore dungeons that are structured differently each time they start a new game. Thanks to the very diverse structures of the dungeons, we believe that the generalization benchmark of Rogue-Gym is sufficiently fair. In our experiments, we evaluate a standard reinforcement learning method, PPO, with and without enhancements for generalization. The results show that some enhancements believed to be effective fail to mitigate the overfitting in Rogue-Gym, although others slightly improve the generalization ability.