RO AI LG MAJul 20, 2021

Using reinforcement learning to autonomously identify sources of error for agents in group missions

Keishu Utimula, Ken-taro Hayaschi, Trevor J. Bihl, Kenta Hongo, Ryo Maezono

arXiv:2107.09232v43.0

Originality Incremental advance

AI Analysis

This addresses the challenge of diagnosing failures in multi-agent systems, though it is incremental as it applies existing reinforcement learning methods to a specific problem.

The study tackled the problem of autonomously identifying whether agent failures in swarm missions are due to actuators or sensors by generating action plans that induce collisions to differentiate between hypotheses, and demonstrated that Q-table reinforcement learning can produce human-like solutions for this task.

When agents swarm to execute a mission, some of them frequently exhibit sudden failure, as observed from the command base. It is generally difficult to determine whether a failure is caused by actuators (hypothesis, $h_a$) or sensors (hypothesis, $h_s$) by solely relying on the communication between the command base and concerning agent. However, by instigating collusion between the agents, the cause of failure can be identified; in other words, we expect to detect corresponding displacements for $h_a$ but not for $h_s$. In this study, we considered the question as to whether artificial intelligence can autonomously generate an action plan $\boldsymbol{g}$ to pinpoint the cause as aforedescribed. Because the expected response to $\boldsymbol{g}$ generally depends upon the adopted hypothesis [let the difference be denoted by $D(\boldsymbol{g})$], a formulation that uses $D\left(\boldsymbol{g}\right)$ to pinpoint the cause can be made. Although a $\boldsymbol{g}^*$ that maximizes $D(\boldsymbol{g})$ would be a suitable action plan for this task, such an optimization is difficult to achieve using the conventional gradient method, as $D(\boldsymbol{g})$ becomes nonzero in rare events such as collisions with other agents, and most swarm actions $\boldsymbol{g}$ give $D(\boldsymbol{g})=0$. In other words, throughout almost the entire space of $\boldsymbol{g}$, $D(\boldsymbol{g})$ has zero gradient, and the gradient method is not applicable. To overcome this problem, we formulated an action plan using Q-table reinforcement learning. Surprisingly, the optimal action plan generated via reinforcement learning presented a human-like solution to pinpoint the problem by colliding other agents with the failed agent. Using this simple prototype, we demonstrated the potential of applying Q-table reinforcement learning methods to plan autonomous actions to pinpoint the causes of failure.

View on arXiv PDF

Similar