Cedric Colas

AI
h-index66
4papers
57citations
Novelty40%
AI Score34

4 Papers

AINov 1, 2023
A Definition of Open-Ended Learning Problems for Goal-Conditioned Agents

Olivier Sigaud, Gianluca Baldassarre, Cedric Colas et al.

A lot of recent machine learning research papers have ``open-ended learning'' in their title. But very few of them attempt to define what they mean when using the term. Even worse, when looking more closely there seems to be no consensus on what distinguishes open-ended learning from related concepts such as continual learning, lifelong learning or autotelic learning. In this paper, we contribute to fixing this situation. After illustrating the genealogy of the concept and more recent perspectives about what it truly means, we outline that open-ended learning is generally conceived as a composite notion encompassing a set of diverse properties. In contrast with previous approaches, we propose to isolate a key elementary property of open-ended processes, which is to produce elements from time to time (e.g., observations, options, reward functions, and goals), over an infinite horizon, that are considered novel from an observer's perspective. From there, we build the notion of open-ended learning problems and focus in particular on the subset of open-ended goal-conditioned reinforcement learning problems in which agents can learn a growing repertoire of goal-driven skills. Finally, we highlight the work that remains to be performed to fill the gap between our elementary definition and the more involved notions of open-ended learning that developmental AI researchers may have in mind.

LGMay 7, 2024Code
Policy Learning with a Language Bottleneck

Megha Srivastava, Cedric Colas, Dorsa Sadigh et al.

Modern AI systems such as self-driving cars and game-playing agents achieve superhuman performance, but often lack human-like generalization, interpretability, and inter-operability with human users. Inspired by the rich interactions between language and decision-making in humans, we introduce Policy Learning with a Language Bottleneck (PLLB), a framework enabling AI agents to generate linguistic rules that capture the high-level strategies underlying rewarding behaviors. PLLB alternates between a *rule generation* step guided by language models, and an *update* step where agents learn new policies guided by rules, even when a rule is insufficient to describe an entire complex policy. Across five diverse tasks, including a two-player signaling game, maze navigation, image reconstruction, and robot grasp planning, we show that PLLB agents are not only able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users, enabling more effective human-AI coordination. We provide source code for our experiments at https://github.com/meghabyte/bottleneck .

AIJul 17, 2025
Assessing Adaptive World Models in Machines with Novel Games

Lance Ying, Katherine M. Collins, Prafull Sharma et al.

Human intelligence exhibits a remarkable capacity for rapid adaptation and effective problem-solving in novel and unfamiliar contexts. We argue that this profound adaptability is fundamentally linked to the efficient construction and refinement of internal representations of the environment, commonly referred to as world models, and we refer to this adaptation mechanism as world model induction. However, current understanding and evaluation of world models in artificial intelligence (AI) remains narrow, often focusing on static representations learned from training on massive corpora of data, instead of the efficiency and efficacy in learning these representations through interaction and exploration within a novel environment. In this Perspective, we provide a view of world model induction drawing on decades of research in cognitive science on how humans learn and adapt so efficiently; we then call for a new evaluation framework for assessing adaptive world models in AI. Concretely, we propose a new benchmarking paradigm based on suites of carefully designed games with genuine, deep and continually refreshing novelty in the underlying game structures -- we refer to this class of games as novel games. We detail key desiderata for constructing these games and propose appropriate metrics to explicitly challenge and evaluate the agent's ability for rapid world model induction. We hope that this new evaluation framework will inspire future evaluation efforts on world models in AI and provide a crucial step towards developing AI systems capable of human-like rapid adaptation and robust generalization -- a critical component of artificial general intelligence.

HCJul 31, 2018
Compact Convolutional Neural Networks for Multi-Class, Personalised, Closed-Loop EEG-BCI

Pablo Ortega, Cedric Colas, Aldo Faisal

For many people suffering from motor disabilities, assistive devices controlled with only brain activity are the only way to interact with their environment. Natural tasks often require different kinds of interactions, involving different controllers the user should be able to select in a self-paced way. We developed a Brain-Computer Interface (BCI) allowing users to switch between four control modes in a self-paced way in real-time. Since the system is devised to be used in domestic environments in a user-friendly way, we selected non-invasive electroencephalographic (EEG) signals and convolutional neural networks (CNNs), known for their ability to find the optimal features in classification tasks. We tested our system using the Cybathlon BCI computer game, which embodies all the challenges inherent to real-time control. Our preliminary results show that an efficient architecture (SmallNet), with only one convolutional layer, can classify 4 mental activities chosen by the user. The BCI system is run and validated online. It is kept up-to-date through the use of newly collected signals along playing, reaching an online accuracy of 47.6% where most approaches only report results obtained offline. We found that models trained with data collected online better predicted the behaviour of the system in real-time. This suggests that similar (CNN based) offline classifying methods found in the literature might experience a drop in performance when applied online. Compared to our previous decoder of physiological signals relying on blinks, we increased by a factor 2 the amount of states among which the user can transit, bringing the opportunity for finer control of specific subtasks composing natural grasping in a self-paced way. Our results are comparable to those shown at the Cybathlon's BCI Race but further improvements on accuracy are required.