Fabien C. Y. Benureau

LG
5papers
27citations
Novelty55%
AI Score39

5 Papers

LGMay 5, 2022
Morphological Wobbling Can Help Robots Learn

Fabien C. Y. Benureau, Jun Tani

We propose to make the physical characteristics of a robot oscillate while it learns to improve its behavioral performance. We consider quantities such as mass, actuator strength, and size that are usually fixed in a robot, and show that when those quantities oscillate at the beginning of the learning process on a simulated 2D soft robot, the performance on a locomotion task can be significantly improved. We investigate the dynamics of the phenomenon and conclude that in our case, surprisingly, a high-frequency oscillation with a large amplitude for a large portion of the learning duration leads to the highest performance benefits. Furthermore, we show that morphological wobbling significantly increases exploration of the search space.

57.8CLMar 16
A Family of LLMs Liberated from Static Vocabularies

Aleph Alpha, Adnen Abdessaied, Artur Baranowski et al.

Tokenization is a central component of natural language processing in current large language models (LLMs), enabling models to convert raw text into processable units. Although learned tokenizers are widely adopted, they exhibit notable limitations, including their large, fixed vocabulary sizes and poor adaptability to new domains or languages. We present a family of models with up to 70 billion parameters based on the hierarchical autoregressive transformer (HAT) architecture. In HAT, an encoder transformer aggregates bytes into word embeddings and then feeds them to the backbone, a classical autoregressive transformer. The outputs of the backbone are then cross-attended by the decoder and converted back into bytes. We show that we can reuse available pre-trained models by converting the Llama 3.1 8B and 70B models into the HAT architecture: Llama-3.1-8B-TFree-HAT and Llama-3.1-70B-TFree-HAT are byte-level models whose encoder and decoder are trained from scratch, but where we adapt the pre-trained Llama backbone, i.e., the transformer blocks with the embedding matrix and head removed, to handle word embeddings instead of the original tokens. We also provide a 7B HAT model, Llama-TFree-HAT-Pretrained, trained entirely from scratch on nearly 4 trillion words. The HAT architecture improves text compression by reducing the number of required sequence positions and enhances robustness to intra-word variations, e.g., spelling differences. Through pre-training, as well as subsequent supervised fine-tuning and direct preference optimization in English and German, we show strong proficiency in both languages, improving on the original Llama 3.1 in most benchmarks. We release our models (including 200 pre-training checkpoints) on Hugging Face.

ROFeb 21, 2022
Goal-directed Planning and Goal Understanding by Active Inference: Evaluation Through Simulated and Physical Robot Experiments

Takazumi Matsumoto, Wataru Ohata, Fabien C. Y. Benureau et al.

We show that goal-directed action planning and generation in a teleological framework can be formulated using the free energy principle. The proposed model, which is built on a variational recurrent neural network model, is characterized by three essential features. These are that (1) goals can be specified for both static sensory states, e.g., for goal images to be reached and dynamic processes, e.g., for moving around an object, (2) the model can not only generate goal-directed action plans, but can also understand goals by sensory observation, and (3) the model generates future action plans for given goals based on the best estimate of the current state, inferred using past sensory observations. The proposed model is evaluated by conducting experiments on a simulated mobile agent as well as on a real humanoid robot performing object manipulation.

NEOct 28, 2020
Morphological Development at the Evolutionary Timescale: Robotic Developmental Evolution

Fabien C. Y. Benureau, Jun Tani

Evolution and development operate at different timescales; generations for the one, a lifetime for the other. These two processes, the basis of much of life on earth, interact in many non-trivial ways, but their temporal hierarchy -- evolution overarching development -- is observed for most multicellular lifeforms. When designing robots however, this tenet lifts: it becomes -- however natural -- a design choice. We propose to inverse this temporal hierarchy and design a developmental process happening at the phylogenetic timescale. Over a classic evolutionary search aimed at finding good gaits for tentacle 2D robots, we add a developmental process over the robots' morphologies. Within a generation, the morphology of the robots does not change. But from one generation to the next, the morphology develops. Much like we become bigger, stronger, and heavier as we age, our robots are bigger, stronger and heavier with each passing generation. Our robots start with baby morphologies, and a few thousand generations later, end-up with adult ones. We show that this produces better and qualitatively different gaits than an evolutionary search with only adult robots, and that it prevents premature convergence by fostering exploration. In addition, we validate our method on voxel lattice 3D robots from the literature and compare it to a recent evolutionary developmental approach. Our method is conceptually simple, and can be effective on small or large populations of robots, and intrinsic to the robot and its morphology, not the task or environment. Furthermore, by recasting the evolutionary search as a learning process, these results can be viewed in the context of developmental learning robotics.

LGAug 23, 2018
Diversity-Driven Selection of Exploration Strategies in Multi-Armed Bandits

Fabien C. Y. Benureau, Pierre-Yves Oudeyer

We consider a scenario where an agent has multiple available strategies to explore an unknown environment. For each new interaction with the environment, the agent must select which exploration strategy to use. We provide a new strategy-agnostic method that treat the situation as a Multi-Armed Bandits problem where the reward signal is the diversity of effects that each strategy produces. We test the method empirically on a simulated planar robotic arm, and establish that the method is both able discriminate between strategies of dissimilar quality, even when the differences are tenuous, and that the resulting performance is competitive with the best fixed mixture of strategies.