ROMar 7, 2021Code
When Being Soft Makes You Tough: A Collision-Resilient Quadcopter Inspired by Arthropods' ExoskeletonsRicardo de Azambuja, Hassan Fouad, Yann Bouteiller et al.
Flying robots are usually rather delicate and require protective enclosures when facing the risk of collision, while high complexity and reduced payload are recurrent problems with collision-resilient flying robots. Inspired by arthropods' exoskeletons, we design a simple, open source, easily manufactured, semi-rigid structure with soft joints that can withstand high-velocity impacts. With an exoskeleton, the protective shell becomes part of the main robot structure, thereby minimizing its loss in payload capacity. Our design is simple to build and customize using cheap components (e.g. bamboo skewers) and consumer-grade 3D printers. The result is CogniFly, a sub-250g autonomous quadcopter that survives multiple collisions at speeds up to 7m/s. In addition to its collision-resiliency, CogniFly is easy to program using Python or Buzz, carries sensors that allow it to fly for approx. 17min without the need of GPS or an external motion capture system, has enough computing power to run deep neural network models on-board and was designed to facilitate integration with an automated battery swapping system. This structure becomes an ideal platform for high-risk activities (such as flying in a cluttered environment or reinforcement learning training) by dramatically reducing the risks of damaging its own hardware or the environment. Source code, 3D files, instructions and videos are available through the project's website (https://thecognifly.github.io).
MAOct 23, 2024
The Hive Mind is a Single Reinforcement Learning AgentKarthik Soma, Yann Bouteiller, Heiko Hamann et al.
Decision-making is an essential attribute of any intelligent agent or group. Natural systems are known to converge to optimal strategies through at least two distinct mechanisms: collective decision-making via imitation of others, and individual trial-and-error. This paper establishes an equivalence between these two paradigms by drawing from the well-established collective decision-making model of nest-hunting in swarms of honey bees. We show that the emergent distributed cognition (sometimes referred to as the $\textit{hive mind}$) arising from individual bees following simple, local imitation-based rules is that of a single online reinforcement learning (RL) agent interacting with many parallel environments. The update rule through which this macro-agent learns is a bandit algorithm that we coin $\textit{Maynard-Cross Learning}$. Our analysis implies that a group of cognition-limited organisms can be equivalent to a more complex, reinforcement-enabled entity, substantiating the idea that group-level intelligence may explain how seemingly simple and blind individual behaviors are selected in nature. From a biological perspective, this analysis suggests how such imitation strategies evolved: they constitute a scalable form of reinforcement learning at the group level, aligning with theories of kin and group selection. Beyond biology, the framework offers new tools for analyzing economic and social systems where individuals imitate successful strategies, effectively participating in a collective learning process. In swarm intelligence, our findings will inform the design of scalable collective systems in artificial domains, enabling RL-inspired mechanisms for coordination and adaptability at scale.
LGOct 22, 2024
Evolution of Societies via Reinforcement LearningYann Bouteiller, Karthik Soma, Giovanni Beltrame
The universe involves many independent co-learning agents as an ever-evolving part of our observed environment. Yet, in practice, Multi-Agent Reinforcement Learning (MARL) applications are typically constrained to small, homogeneous populations and remain computationally intensive. We propose a methodology that enables simulating populations of Reinforcement Learning agents at evolutionary scale. More specifically, we derive a fast, parallelizable implementation of Policy Gradient (PG) and Opponent-Learning Awareness (LOLA), tailored for evolutionary simulations where agents undergo random pairwise interactions in stateless normal-form games. We demonstrate our approach by simulating the evolution of very large populations made of heterogeneous co-learning agents, under both naive and advanced learning strategies. In our experiments, 200,000 PG or LOLA agents evolve in the classic games of Hawk-Dove, Stag-Hunt, and Rock-Paper-Scissors. Each game provides distinct insights into how populations evolve under both naive and advanced MARL rules, including compelling ways in which Opponent-Learning Awareness affects social evolution.
SPJul 28, 2021
The Portiloop: a deep learning-based open science tool for closed-loop brain stimulationNicolas Valenchon, Yann Bouteiller, Hugo R. Jourde et al.
Closed-loop brain stimulation refers to capturing neurophysiological measures such as electroencephalography (EEG), quickly identifying neural events of interest, and producing auditory, magnetic or electrical stimulation so as to interact with brain processes precisely. It is a promising new method for fundamental neuroscience and perhaps for clinical applications such as restoring degraded memory function; however, existing tools are expensive, cumbersome, and offer limited experimental flexibility. In this article, we propose the Portiloop, a deep learning-based, portable and low-cost closed-loop stimulation system able to target specific brain oscillations. We first document open-hardware implementations that can be constructed from commercially available components. We also provide a fast, lightweight neural network model and an exploration algorithm that automatically optimizes the model hyperparameters to the desired brain oscillation. Finally, we validate the technology on a challenging test case of real-time sleep spindle detection, with results comparable to off-line expert performance on the Massive Online Data Annotation spindle dataset (MODA; group consensus). Software and plans are available to the community as an open science initiative to encourage further development and advance closed-loop neuroscience research.
LGOct 6, 2020
Reinforcement Learning with Random DelaysSimon Ramstedt, Yann Bouteiller, Giovanni Beltrame et al.
Action and observation delays commonly occur in many Reinforcement Learning applications, such as remote control scenarios. We study the anatomy of randomly delayed environments, and show that partially resampling trajectory fragments in hindsight allows for off-policy multi-step value estimation. We apply this principle to derive Delay-Correcting Actor-Critic (DCAC), an algorithm based on Soft Actor-Critic with significantly better performance in environments with delays. This is shown theoretically and also demonstrated practically on a delay-augmented version of the MuJoCo continuous control benchmark.