Bart Dhoedt

LG
h-index28
45papers
982citations
Novelty47%
AI Score46

45 Papers

71.2ROApr 22Code
Online Structure Learning and Planning for Autonomous Robot Navigation using Active Inference

Daria de tinguy, Tim Verbelen, Emilio Gamba et al.

Autonomous navigation in unfamiliar environments requires robots to simultaneously explore, localise, and plan under uncertainty, without relying on predefined maps or extensive training. We present Active Inference MAPping and Planning (AIMAPP), a framework unifying mapping, localisation, and decision-making within a single generative model, drawing on cognitive-mapping concepts from animal navigation (topological organisation, discrete spatial representations and predictive belief updating) as design inspiration. The agent builds and updates a sparse topological map online, learns state transitions dynamically, and plans actions by minimising Expected Free Energy. This allows it to balance goal-directed and exploratory behaviours. We implemented AIMAPP as a ROS-compatible system that is sensor and robot-agnostic and integrates with diverse hardware configurations. It operates in a fully self-supervised manner, is resilient to sensor failure, continues operating under odometric drift, and supports both exploration and goal-directed navigation without any pre-training. We evaluate the system in large-scale real and simulated environments against state-of-the-art planning baselines, demonstrating its adaptability to ambiguous observations, environmental changes, and sensor noise. The model offers a modular, self-supervised solution to scalable navigation in unstructured settings. AIMAPP is available at https://github.com/decide-ugent/aimapp.

LGJul 13, 2022
The Free Energy Principle for Perception and Action: A Deep Learning Perspective

Pietro Mazzaglia, Tim Verbelen, Ozan Çatal et al.

The free energy principle, and its corollary active inference, constitute a bio-inspired theory that assumes biological agents act to remain in a restricted set of preferred states of the world, i.e., they minimize their free energy. Under this principle, biological agents learn a generative model of the world and plan actions in the future that will maintain the agent in an homeostatic state that satisfies its preferences. This framework lends itself to being realized in silico, as it comprehends important aspects that make it computationally affordable, such as variational inference and amortized planning. In this work, we investigate the tool of deep learning to design and realize artificial agents based on active inference, presenting a deep-learning oriented presentation of the free energy principle, surveying works that are relevant in both machine learning and active inference areas, and discussing the design choices that are involved in the implementation process. This manuscript probes newer perspectives for the active inference framework, grounding its theoretical aspects into more pragmatic affairs, offering a practical guide to active inference newcomers and a starting point for deep learning practitioners that would like to investigate implementations of the free energy principle.

AINov 23, 2022
Choreographer: Learning and Adapting Skills in Imagination

Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt et al.

Unsupervised skill learning aims to learn a rich repertoire of behaviors without external supervision, providing artificial agents with the ability to control and influence the environment. However, without appropriate knowledge and exploration, skills may provide control only over a restricted area of the environment, limiting their applicability. Furthermore, it is unclear how to leverage the learned skill behaviors for adapting to downstream tasks in a data-efficient manner. We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination. Our method decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model. During adaptation, the agent uses a meta-controller to evaluate and adapt the learned skills efficiently by deploying them in parallel in imagination. Choreographer is able to learn skills both from offline data, and by collecting data simultaneously with an exploration policy. The skills can be used to effectively adapt to downstream tasks, as we show in the URL benchmark, where we outperform previous approaches from both pixels and states inputs. The learned skills also explore the environment thoroughly, finding sparse rewards more frequently, as shown in goal-reaching tasks from the DMC Suite and Meta-World. Website and code: https://skillchoreographer.github.io/

AISep 24, 2022
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

Sai Rajeswar, Pietro Mazzaglia, Tim Verbelen et al.

Controlling artificial agents from visual sensory data is an arduous task. Reinforcement learning (RL) algorithms can succeed but require large amounts of interactions between the agent and the environment. To alleviate the issue, unsupervised RL proposes to employ self-supervised interaction and learning, for adapting faster to future tasks. Yet, as shown in the Unsupervised RL Benchmark (URLB; Laskin et al. 2021), whether current unsupervised strategies can improve generalization capabilities is still unclear, especially in visual control settings. In this work, we study the URLB and propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent, and a task-aware fine-tuning strategy combined with a new proposed hybrid planner, Dyna-MPC, to adapt the agent for downstream tasks. On URLB, our method obtains 93.59% overall normalized performance, surpassing previous baselines by a staggering margin. The approach is empirically evaluated through a large-scale empirical study, which we use to validate our design choices and analyze our models. We also show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation. Project website: https://masteringurlb.github.io/

ROJul 5, 2023
FOCUS: Object-Centric World Models for Robotics Manipulation

Stefano Ferraro, Pietro Mazzaglia, Tim Verbelen et al.

Understanding the world in terms of objects and the possible interplays with them is an important cognition ability, especially in robotics manipulation, where many tasks require robot-object interactions. However, learning such a structured world model, which specifically captures entities and relationships, remains a challenging and underexplored problem. To address this, we propose FOCUS, a model-based agent that learns an object-centric world model. Thanks to a novel exploration bonus that stems from the object-centric representation, FOCUS can be deployed on robotics manipulation tasks to explore object interactions more easily. Evaluating our approach on manipulation tasks across different settings, we show that object-centric world models allow the agent to solve tasks more efficiently and enable consistent exploration of robot-object interactions. Using a Franka Emika robot arm, we also showcase how FOCUS could be adopted in real-world settings.

AIAug 16, 2023
Integrating cognitive map learning and active inference for planning in ambiguous environments

Toon Van de Maele, Bart Dhoedt, Tim Verbelen et al.

Living organisms need to acquire both cognitive maps for learning the structure of the world and planning mechanisms able to deal with the challenges of navigating ambiguous environments. Although significant progress has been made in each of these areas independently, the best way to integrate them is an open research question. In this paper, we propose the integration of a statistical model of cognitive map formation within an active inference agent that supports planning under uncertainty. Specifically, we examine the clone-structured cognitive graph (CSCG) model of cognitive map formation and compare a naive clone graph agent with an active inference-driven clone graph agent, in three spatial navigation scenarios. Our findings demonstrate that while both agents are effective in simple scenarios, the active inference agent is more effective when planning in challenging scenarios, in which sensory observations provide ambiguous information about location.

CVApr 14, 2023
Symmetry and Complexity in Object-Centric Deep Active Inference Models

Stefano Ferraro, Toon Van de Maele, Tim Verbelen et al.

Humans perceive and interact with hundreds of objects every day. In doing so, they need to employ mental models of these objects and often exploit symmetries in the object's shape and appearance in order to learn generalizable and transferable skills. Active inference is a first principles approach to understanding and modeling sentient agents. It states that agents entertain a generative model of their environment, and learn and act by minimizing an upper bound on their surprisal, i.e. their Free Energy. The Free Energy decomposes into an accuracy and complexity term, meaning that agents favor the least complex model, that can accurately explain their sensory observations. In this paper, we investigate how inherent symmetries of particular objects also emerge as symmetries in the latent state space of the generative model learnt under deep active inference. In particular, we focus on object-centric representations, which are trained from pixels to predict novel object views as the agent moves its viewpoint. First, we investigate the relation between model complexity and symmetry exploitation in the state space. Second, we do a principal component analysis to demonstrate how the model encodes the principal axis of symmetry of the object in the latent space. Finally, we also demonstrate how more symmetrical representations can be exploited for better generalization in the context of manipulation.

CVSep 16, 2022
Disentangling Shape and Pose for Object-Centric Deep Active Inference Models

Stefano Ferraro, Toon Van de Maele, Pietro Mazzaglia et al.

Active inference is a first principles approach for understanding the brain in particular, and sentient agents in general, with the single imperative of minimizing free energy. As such, it provides a computational account for modelling artificial intelligent agents, by defining the agent's generative model and inferring the model parameters, actions and hidden state beliefs. However, the exact specification of the generative model and the hidden state space structure is left to the experimenter, whose design choices influence the resulting behaviour of the agent. Recently, deep learning methods have been proposed to learn a hidden state space structure purely from data, alleviating the experimenter from this tedious design task, but resulting in an entangled, non-interpreteable state space. In this paper, we hypothesize that such a learnt, entangled state space does not necessarily yield the best model in terms of free energy, and that enforcing different factors in the state space can yield a lower model complexity. In particular, we consider the problem of 3D object representation, and focus on different instances of the ShapeNet dataset. We propose a model that factorizes object shape, pose and category, while still learning a representation for each factor using a deep neural network. We show that models, with best disentanglement properties, perform best when adopted by an active agent in reaching preferred observations.

AIAug 12, 2024
Exploring and Learning Structure: Active Inference Approach in Navigational Agents

Daria de Tinguy, Tim Verbelen, Bart Dhoedt

Drawing inspiration from animal navigation strategies, we introduce a novel computational model for navigation and mapping, rooted in biologically inspired principles. Animals exhibit remarkable navigation abilities by efficiently using memory, imagination, and strategic decision-making to navigate complex and aliased environments. Building on these insights, we integrate traditional cognitive mapping approaches with an Active Inference Framework (AIF) to learn an environment structure in a few steps. Through the incorporation of topological mapping for long-term memory and AIF for navigation planning and structure learning, our model can dynamically apprehend environmental structures and expand its internal map with predicted beliefs during exploration. Comparative experiments with the Clone-Structured Graph (CSCG) model highlight our model's ability to rapidly learn environmental structures in a single episode, with minimal navigation overlap. this is achieved without prior knowledge of the dimensions of the environment or the type of observations, showcasing its robustness and effectiveness in navigating ambiguous environments.

ROFeb 7, 2023
Object-Centric Scene Representations using Active Inference

Toon Van de Maele, Tim Verbelen, Pietro Mazzaglia et al.

Representing a scene and its constituent objects from raw sensory data is a core ability for enabling robots to interact with their environment. In this paper, we propose a novel approach for scene understanding, leveraging a hierarchical object-centric generative model that enables an agent to infer object category and pose in an allocentric reference frame using active inference, a neuro-inspired framework for action and perception. For evaluating the behavior of an active vision agent, we also propose a new benchmark where, given a target viewpoint of a particular object, the agent needs to find the best matching viewpoint given a workspace with randomly positioned objects in 3D. We demonstrate that our active inference agent is able to balance epistemic foraging and goal-driven behavior, and outperforms both supervised and reinforcement learning baselines by a large margin.

LGAug 19, 2022
Home Run: Finding Your Way Home by Imagining Trajectories

Daria de Tinguy, Pietro Mazzaglia, Tim Verbelen et al.

When studying unconstrained behaviour and allowing mice to leave their cage to navigate a complex labyrinth, the mice exhibit foraging behaviour in the labyrinth searching for rewards, returning to their home cage now and then, e.g. to drink. Surprisingly, when executing such a ``home run'', the mice do not follow the exact reverse path, in fact, the entry path and home path have very little overlap. Recent work proposed a hierarchical active inference model for navigation, where the low level model makes inferences about hidden states and poses that explain sensory inputs, whereas the high level model makes inferences about moving between locations, effectively building a map of the environment. However, using this ``map'' for planning, only allows the agent to find trajectories that it previously explored, far from the observed mice's behaviour. In this paper, we explore ways of incorporating before-unvisited paths in the planning algorithm, by using the low level generative model to imagine potential, yet undiscovered paths. We demonstrate a proof of concept in a grid-world environment, showing how an agent can accurately predict a new, shorter path in the map leading to its starting point, using a generative model learnt from pixel-based observations.

LGAug 18, 2022
Learning Generative Models for Active Inference using Tensor Networks

Samuel T. Wauthier, Bram Vanhecke, Tim Verbelen et al.

Active inference provides a general framework for behavior and learning in autonomous agents. It states that an agent will attempt to minimize its variational free energy, defined in terms of beliefs over observations, internal states and policies. Traditionally, every aspect of a discrete active inference model must be specified by hand, i.e. by manually defining the hidden state space structure, as well as the required distributions such as likelihood and transition probabilities. Recently, efforts have been made to learn state space representations automatically from observations using deep neural networks. In this paper, we present a novel approach of learning state spaces using quantum physics-inspired tensor networks. The ability of tensor networks to represent the probabilistic nature of quantum states as well as to reduce large state spaces makes tensor networks a natural candidate for active inference. We show how tensor networks can be used as a generative model for sequential data. Furthermore, we show how one can obtain beliefs from such a generative model and how an active inference agent can use these to compute the expected free energy. Finally, we demonstrate our method on the classic T-maze environment.

SDJul 14, 2022
Audio-guided Album Cover Art Generation with Genetic Algorithms

James Marien, Sam Leroux, Bart Dhoedt et al.

Over 60,000 songs are released on Spotify every day, and the competition for the listener's attention is immense. In that regard, the importance of captivating and inviting cover art cannot be underestimated, because it is deeply entangled with a song's character and the artist's identity, and remains one of the most important gateways to lead people to discover music. However, designing cover art is a highly creative, lengthy and sometimes expensive process that can be daunting, especially for non-professional artists. For this reason, we propose a novel deep-learning framework to generate cover art guided by audio features. Inspired by VQGAN-CLIP, our approach is highly flexible because individual components can easily be replaced without the need for any retraining. This paper outlines the architectural details of our models and discusses the optimization challenges that emerge from them. More specifically, we will exploit genetic algorithms to overcome bad local minima and adversarial examples. We find that our framework can generate suitable cover art for most genres, and that the visual features adapt themselves to audio feature changes. Given these results, we believe that our framework paves the road for extensions and more advanced applications in audio-guided visual generation tasks.

ROAug 10, 2025
Bio-Inspired Topological Autonomous Navigation with Active Inference in Robotics

Daria de Tinguy, Tim Verbelen, Emilio Gamba et al.

Achieving fully autonomous exploration and navigation remains a critical challenge in robotics, requiring integrated solutions for localisation, mapping, decision-making and motion planning. Existing approaches either rely on strict navigation rules lacking adaptability or on pre-training, which requires large datasets. These AI methods are often computationally intensive or based on static assumptions, limiting their adaptability in dynamic or unknown environments. This paper introduces a bio-inspired agent based on the Active Inference Framework (AIF), which unifies mapping, localisation, and adaptive decision-making for autonomous navigation, including exploration and goal-reaching. Our model creates and updates a topological map of the environment in real-time, planning goal-directed trajectories to explore or reach objectives without requiring pre-training. Key contributions include a probabilistic reasoning framework for interpretable navigation, robust adaptability to dynamic changes, and a modular ROS2 architecture compatible with existing navigation systems. Our method was tested in simulated and real-world environments. The agent successfully explores large-scale simulated environments and adapts to dynamic obstacles and drift, proving to be comparable to other exploration strategies such as Gbplanner, FAEL and Frontiers. This approach offers a scalable and transparent approach for navigating complex, unstructured environments.

ROSep 18, 2024
Representing Positional Information in Generative World Models for Object Manipulation

Stefano Ferraro, Pietro Mazzaglia, Tim Verbelen et al.

Object manipulation capabilities are essential skills that set apart embodied agents engaging with the world, especially in the realm of robotics. The ability to predict outcomes of interactions with objects is paramount in this setting. While model-based control methods have started to be employed for tackling manipulation tasks, they have faced challenges in accurately manipulating objects. As we analyze the causes of this limitation, we identify the cause of underperformance in the way current world models represent crucial positional information, especially about the target's goal specification for object positioning tasks. We introduce a general approach that empowers world model-based agents to effectively solve object-positioning tasks. We propose two declinations of this approach for generative world models: position-conditioned (PCP) and latent-conditioned (LCP) policy learning. In particular, LCP employs object-centric latent representations that explicitly capture object positional information for goal specification. This naturally leads to the emergence of multimodal capabilities, enabling the specification of goals through spatial coordinates or a visual goal. Our methods are rigorously evaluated across several manipulation environments, showing favorable performance compared to current model-based control approaches.

AISep 18, 2023
Learning Spatial and Temporal Hierarchies: Hierarchical Active Inference for navigation in Multi-Room Maze Environments

Daria de Tinguy, Toon Van de Maele, Tim Verbelen et al.

Cognitive maps play a crucial role in facilitating flexible behaviour by representing spatial and conceptual relationships within an environment. The ability to learn and infer the underlying structure of the environment is crucial for effective exploration and navigation. This paper introduces a hierarchical active inference model addressing the challenge of inferring structure in the world from pixel-based observations. We propose a three-layer hierarchical model consisting of a cognitive map, an allocentric, and an egocentric world model, combining curiosity-driven exploration with goal-oriented behaviour at the different levels of reasoning from context to place to motion. This allows for efficient exploration and goal-directed search in room-structured mini-grid environments.

AIJun 23, 2023
Inferring Hierarchical Structure in Multi-Room Maze Environments

Daria de Tinguy, Toon Van de Maele, Tim Verbelen et al.

Cognitive maps play a crucial role in facilitating flexible behaviour by representing spatial and conceptual relationships within an environment. The ability to learn and infer the underlying structure of the environment is crucial for effective exploration and navigation. This paper introduces a hierarchical active inference model addressing the challenge of inferring structure in the world from pixel-based observations. We propose a three-layer hierarchical model consisting of a cognitive map, an allocentric, and an egocentric world model, combining curiosity-driven exploration with goal-oriented behaviour at the different levels of reasoning from context to place to motion. This allows for efficient exploration and goal-directed search in room-structured mini-grid environments.

RONov 13, 2024
Learning Dynamic Cognitive Map with Autonomous Navigation

Daria de Tinguy, Tim Verbelen, Bart Dhoedt

Inspired by animal navigation strategies, we introduce a novel computational model to navigate and map a space rooted in biologically inspired principles. Animals exhibit extraordinary navigation prowess, harnessing memory, imagination, and strategic decision-making to traverse complex and aliased environments adeptly. Our model aims to replicate these capabilities by incorporating a dynamically expanding cognitive map over predicted poses within an Active Inference framework, enhancing our agent's generative model plasticity to novelty and environmental changes. Through structure learning and active inference navigation, our model demonstrates efficient exploration and exploitation, dynamically expanding its model capacity in response to anticipated novel un-visited locations and updating the map given new evidence contradicting previous beliefs. Comparative analyses in mini-grid environments with the Clone-Structured Cognitive Graph model (CSCG), which shares similar objectives, highlight our model's ability to rapidly learn environmental structures within a single episode, with minimal navigation overlap. Our model achieves this without prior knowledge of observation and world dimensions, underscoring its robustness and efficacy in navigating intricate environments.

AIJun 26, 2024
GenRL: Multimodal-foundation world models for generalization in embodied agents

Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt et al.

Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning (RL) is hard to scale up as it requires a complex reward design for each task. In contrast, language can specify tasks in a more natural way. Current foundation vision-language models (VLMs) generally require fine-tuning or other adaptations to be adopted in embodied contexts, due to the significant domain gap. However, the lack of multimodal data in such domains represents an obstacle to developing foundation models for embodied applications. In this work, we overcome these problems by presenting multimodal-foundation world models, able to connect and align the representation of foundation VLMs with the latent space of generative world models for RL, without any language annotations. The resulting agent learning framework, GenRL, allows one to specify tasks through vision and/or language prompts, ground them in the embodied domain's dynamics, and learn the corresponding behaviors in imagination. As assessed through large-scale multi-task benchmarking in locomotion and manipulation domains, GenRL enables multi-task generalization from language and visual prompts. Furthermore, by introducing a data-free policy learning strategy, our approach lays the groundwork for foundational policy learning using generative world models. Website, code and data: https://mazpie.github.io/genrl/

LGOct 19, 2021
Contrastive Active Inference

Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt

Active inference is a unifying theory for perception and action resting upon the idea that the brain maintains an internal model of the world by minimizing free energy. From a behavioral perspective, active inference agents can be seen as self-evidencing beings that act to fulfill their optimistic predictions, namely preferred outcomes or goals. In contrast, reinforcement learning requires human-designed rewards to accomplish any desired outcome. Although active inference could provide a more natural self-supervised objective for control, its applicability has been limited because of the shortcomings in scaling the approach to complex environments. In this work, we propose a contrastive objective for active inference that strongly reduces the computational burden in learning the agent's generative model and planning future actions. Our method performs notably better than likelihood-based active inference in image-based tasks, while also being computationally cheaper and easier to train. We compare to reinforcement learning agents that have access to human-designed reward functions, showing that our approach closely matches their performance. Finally, we also show that contrastive methods perform significantly better in the case of distractors in the environment and that our method is able to generalize goals to variations in the background. Website and code: https://contrastive-aif.github.io/

AIAug 26, 2021
Disentangling What and Where for 3D Object-Centric Representations Through Active Inference

Toon Van de Maele, Tim Verbelen, Ozan Catal et al.

Although modern object detection and classification models achieve high accuracy, these are typically constrained in advance on a fixed train set and are therefore not flexible to deal with novel, unseen object categories. Moreover, these models most often operate on a single frame, which may yield incorrect classifications in case of ambiguous viewpoints. In this paper, we propose an active inference agent that actively gathers evidence for object classifications, and can learn novel object categories over time. Drawing inspiration from the human brain, we build object-centric generative models composed of two information streams, a what- and a where-stream. The what-stream predicts whether the observed object belongs to a specific category, while the where-stream is responsible for representing the object in its internal 3D reference frame. We show that our agent (i) is able to learn representations for many object categories in an unsupervised way, (ii) achieves state-of-the-art classification accuracies, actively resolving ambiguity when required and (iii) identifies novel object categories. Furthermore, we validate our system in an end-to-end fashion where the agent is able to search for an object at a given pose from a pixel-based rendering. We believe that this is a first step towards building modular, intelligent systems that can be used for a wide range of tasks involving three dimensional objects.

ROJun 17, 2021
Towards bio-inspired unsupervised representation learning for indoor aerial navigation

Ni Wang, Ozan Catal, Tim Verbelen et al.

Aerial navigation in GPS-denied, indoor environments, is still an open challenge. Drones can perceive the environment from a richer set of viewpoints, while having more stringent compute and energy constraints than other autonomous platforms. To tackle that problem, this research displays a biologically inspired deep-learning algorithm for simultaneous localization and mapping (SLAM) and its application in a drone navigation system. We propose an unsupervised representation learning method that yields low-dimensional latent state descriptors, that mitigates the sensitivity to perceptual aliasing, and works on power-efficient, embedded hardware. The designed algorithm is evaluated on a dataset collected in an indoor warehouse environment, and initial results show the feasibility for robust indoor aerial navigation.

ROMay 7, 2021
LatentSLAM: unsupervised multi-sensor representation learning for localization and mapping

Ozan Çatal, Wouter Jansen, Tim Verbelen et al.

Biologically inspired algorithms for simultaneous localization and mapping (SLAM) such as RatSLAM have been shown to yield effective and robust robot navigation in both indoor and outdoor environments. One drawback however is the sensitivity to perceptual aliasing due to the template matching of low-dimensional sensory templates. In this paper, we propose an unsupervised representation learning method that yields low-dimensional latent state descriptors that can be used for RatSLAM. Our method is sensor agnostic and can be applied to any sensor modality, as we illustrate for camera images, radar range-doppler maps and lidar scans. We also show how combining multiple sensors can increase the robustness, by reducing the number of false matches. We evaluate on a dataset captured with a mobile robot navigating in a warehouse-like environment, moving through different aisles with similar appearance, making it hard for the SLAM algorithms to disambiguate locations.

LGApr 22, 2021
A learning gap between neuroscience and reinforcement learning

Samuel T. Wauthier, Pietro Mazzaglia, Ozan Çatal et al.

Historically, artificial intelligence has drawn much inspiration from neuroscience to fuel advances in the field. However, current progress in reinforcement learning is largely focused on benchmark problems that fail to capture many of the aspects that are of interest in neuroscience today. We illustrate this point by extending a T-maze task from neuroscience for use with reinforcement learning algorithms, and show that state-of-the-art algorithms are not capable of solving this problem. Finally, we point out where insights from neuroscience could help explain some of the issues encountered.

LGApr 15, 2021
Curiosity-Driven Exploration via Latent Bayesian Surprise

Pietro Mazzaglia, Ozan Catal, Tim Verbelen et al.

The human intrinsic desire to pursue knowledge, also known as curiosity, is considered essential in the process of skill acquisition. With the aid of artificial curiosity, we could equip current techniques for control, such as Reinforcement Learning, with more natural exploration capabilities. A promising approach in this respect has consisted of using Bayesian surprise on model parameters, i.e. a metric for the difference between prior and posterior beliefs, to favour exploration. In this contribution, we propose to apply Bayesian surprise in a latent space representing the agent's current understanding of the dynamics of the system, drastically reducing the computational costs. We extensively evaluate our method by measuring the agent's performance in terms of environment exploration, for continuous tasks, and looking at the game scores achieved, for video games. Our model is computationally cheap and compares positively with current state-of-the-art methods on several problems. We also investigate the effects caused by stochasticity in the environment, which is often a failure case for curiosity-driven agents. In this regime, the results suggest that our approach is resilient to stochastic transitions.

LGMar 24, 2020
Dynamic Narrowing of VAE Bottlenecks Using GECO and L0 Regularization

Cedric De Boom, Samuel Wauthier, Tim Verbelen et al.

When designing variational autoencoders (VAEs) or other types of latent space models, the dimensionality of the latent space is typically defined upfront. In this process, it is possible that the number of dimensions is under- or overprovisioned for the application at hand. In case the dimensionality is not predefined, this parameter is usually determined using time- and resource-consuming cross-validation. For these reasons we have developed a technique to shrink the latent space dimensionality of VAEs automatically and on-the-fly during training using Generalized ELBO with Constrained Optimization (GECO) and the $L_0$-Augment-REINFORCE-Merge ($L_0$-ARM) gradient estimator. The GECO optimizer ensures that we are not violating a predefined upper bound on the reconstruction error. This paper presents the algorithmic details of our method along with experimental results on five different datasets. We find that our training procedure is stable and that the latent space can be pruned effectively without violating the GECO constraints.

AIMar 6, 2020
Deep Active Inference for Autonomous Robot Navigation

Ozan Çatal, Samuel Wauthier, Tim Verbelen et al.

Active inference is a theory that underpins the way biological agent's perceive and act in the real world. At its core, active inference is based on the principle that the brain is an approximate Bayesian inference engine, building an internal generative model to drive agents towards minimal surprise. Although this theory has shown interesting results with grounding in cognitive neuroscience, its application remains limited to simulations with small, predefined sensor and state spaces. In this paper, we leverage recent advances in deep learning to build more complex generative models that can work without a predefined states space. State representations are learned end-to-end from real-world, high-dimensional sensory data such as camera frames. We also show that these generative models can be used to engage in active inference. To the best of our knowledge this is the first application of deep active inference for a real-world robot navigation task.

SDFeb 21, 2020
Rhythm, Chord and Melody Generation for Lead Sheets using Recurrent Neural Networks

Cedric De Boom, Stephanie Van Laere, Tim Verbelen et al.

Music that is generated by recurrent neural networks often lacks a sense of direction and coherence. We therefore propose a two-stage LSTM-based model for lead sheet generation, in which the harmonic and rhythmic templates of the song are produced first, after which, in a second stage, a sequence of melody notes is generated conditioned on these templates. A subjective listening test shows that our approach outperforms the baselines and increases perceived musical coherence.

LGJan 30, 2020
Learning Perception and Planning with Deep Active Inference

Ozan Çatal, Tim Verbelen, Johannes Nauta et al.

Active inference is a process theory of the brain that states that all living organisms infer actions in order to minimize their (expected) free energy. However, current experiments are limited to predefined, often discrete, state spaces. In this paper we use recent advances in deep learning to learn the state space and approximate the necessary probability distributions to engage in active inference.

ROJan 28, 2020
Learning to Catch Piglets in Flight

Ozan Çatal, Lawrence De Mol, Tim Verbelen et al.

Catching objects in-flight is an outstanding challenge in robotics. In this paper, we present a closed-loop control system fusing data from two sensor modalities: an RGB-D camera and a radar. To develop and test our method, we start with an easy to identify object: a stuffed Piglet. We implement and compare two approaches to detect and track the object, and to predict the interception point. A baseline model uses colour filtering for locating the thrown object in the environment, while the interception point is predicted using a least squares regression over the physical ballistic trajectory equations. A deep learning based method uses artificial neural networks for both object detection and interception point prediction. We show that we are able to successfully catch Piglet in 80% of the cases with our deep learning approach.

LGApr 17, 2019
Bayesian policy selection using active inference

Ozan Çatal, Johannes Nauta, Tim Verbelen et al.

Learning to take actions based on observations is a core requirement for artificial agents to be able to be successful and robust at their task. Reinforcement Learning (RL) is a well-known technique for learning such policies. However, current RL algorithms often have to deal with reward shaping, have difficulties generalizing to other environments and are most often sample inefficient. In this paper, we explore active inference and the free energy principle, a normative theory from neuroscience that explains how self-organizing biological systems operate by maintaining a model of the world and casting action selection as an inference problem. We apply this concept to a typical problem known to the RL community, the mountain car problem, and show how active inference encompasses both RL and learning from demonstrations.

LGNov 12, 2018
Improving Generalization for Abstract Reasoning Tasks Using Disentangled Feature Representations

Xander Steenbrugge, Sam Leroux, Tim Verbelen et al.

In this work we explore the generalization characteristics of unsupervised representation learning by leveraging disentangled VAE's to learn a useful latent space on a set of relational reasoning problems derived from Raven Progressive Matrices. We show that the latent representations, learned by unsupervised training using the right objective function, significantly outperform the same architectures trained with purely supervised learning, especially when it comes to generalization.

CVSep 11, 2018
Visualizing Convolutional Neural Networks to Improve Decision Support for Skin Lesion Classification

Pieter Van Molle, Miguel De Strooper, Tim Verbelen et al.

Because of their state-of-the-art performance in computer vision, CNNs are becoming increasingly popular in a variety of fields, including medicine. However, as neural networks are black box function approximators, it is difficult, if not impossible, for a medical expert to reason about their output. This could potentially result in the expert distrusting the network when he or she does not agree with its output. In such a case, explaining why the CNN makes a certain decision becomes valuable information. In this paper, we try to open the black box of the CNN by inspecting and visualizing the learned feature maps, in the field of dermatology. We show that, to some extent, CNNs focus on features similar to those used by dermatologists to make a diagnosis. However, more research is required for fully explaining their output.

CVJun 9, 2018
Learning to Grasp from a Single Demonstration

Pieter Van Molle, Tim Verbelen, Elias De Coninck et al.

Learning-based approaches for robotic grasping using visual sensors typically require collecting a large size dataset, either manually labeled or by many trial and errors of a robotic manipulator in the real or simulated world. We propose a simpler learning-from-demonstration approach that is able to detect the object to grasp from merely a single demonstration using a convolutional neural network we call GraspNet. In order to increase robustness and decrease the training time even further, we leverage data from previous demonstrations to quickly fine-tune a GrapNet for each new demonstration. We present some preliminary results on a grasping experiment with the Franka Panda cobot for which we can train a GraspNet with only hundreds of train iterations.

LGMay 30, 2018
Privacy Aware Offloading of Deep Neural Networks

Sam Leroux, Tim Verbelen, Pieter Simoens et al.

Deep neural networks require large amounts of resources which makes them hard to use on resource constrained devices such as Internet-of-things devices. Offloading the computations to the cloud can circumvent these constraints but introduces a privacy risk since the operator of the cloud is not necessarily trustworthy. We propose a technique that obfuscates the data before sending it to the remote computation node. The obfuscated data is unintelligible for a human eavesdropper but can still be classified with a high accuracy by a neural network trained on unobfuscated images.

CVApr 26, 2018
IamNN: Iterative and Adaptive Mobile Neural Network for Efficient Image Classification

Sam Leroux, Pavlo Molchanov, Pieter Simoens et al.

Deep residual networks (ResNets) made a recent breakthrough in deep learning. The core idea of ResNets is to have shortcut connections between layers that allow the network to be much deeper while still being easy to optimize avoiding vanishing gradients. These shortcut connections have interesting side-effects that make ResNets behave differently from other typical network architectures. In this work we use these properties to design a network based on a ResNet but with parameter sharing and with adaptive computation time. The resulting network is much smaller than the original network and can adapt the computational cost to the complexity of the input image.

LGJan 2, 2018
Character-level Recurrent Neural Networks in Practice: Comparing Training and Sampling Schemes

Cedric De Boom, Thomas Demeester, Bart Dhoedt

Recurrent neural networks are nowadays successfully used in an abundance of applications, going from text, speech and image processing to recommender systems. Backpropagation through time is the algorithm that is commonly used to train these networks on specific tasks. Many deep learning frameworks have their own implementation of training and sampling procedures for recurrent neural networks, while there are in fact multiple other possibilities to choose from and other parameters to tune. In existing literature this is very often overlooked or ignored. In this paper we therefore give an overview of possible training and sampling schemes for character-level recurrent neural networks to solve the task of predicting the next token in a given sequence. We test these different schemes on a variety of datasets, neural network architectures and parameter settings, and formulate a number of take-home recommendations. The choice of training and sampling scheme turns out to be subject to a number of trade-offs, such as training stability, sampling time, model performance and implementation effort, but is largely independent of the data. Perhaps the most surprising result is that transferring hidden states for correctly initializing the model on subsequences often leads to unstable training behavior depending on the dataset.

NENov 29, 2017
Transfer Learning with Binary Neural Networks

Sam Leroux, Steven Bohez, Tim Verbelen et al.

Previous work has shown that it is possible to train deep neural networks with low precision weights and activations. In the extreme case it is even possible to constrain the network to binary values. The costly floating point multiplications are then reduced to fast logical operations. High end smart phones such as Google's Pixel 2 and Apple's iPhone X are already equipped with specialised hardware for image processing and it is very likely that other future consumer hardware will also have dedicated accelerators for deep neural networks. Binary neural networks are attractive in this case because the logical operations are very fast and efficient when implemented in hardware. We propose a transfer learning based architecture where we first train a binary network on Imagenet and then retrain part of the network for different tasks while keeping most of the network fixed. The fixed binary part could be implemented in a hardware accelerator while the last layers of the network are evaluated in software. We show that a single binary neural network trained on the Imagenet dataset can indeed be used as a feature extractor for other datasets.

IRAug 22, 2017
Large-Scale User Modeling with Recurrent Neural Networks for Music Discovery on Multiple Time Scales

Cedric De Boom, Rohan Agrawal, Samantha Hansen et al.

The amount of content on online music streaming platforms is immense, and most users only access a tiny fraction of this content. Recommender systems are the application of choice to open up the collection to these users. Collaborative filtering has the disadvantage that it relies on explicit ratings, which are often unavailable, and generally disregards the temporal nature of music consumption. On the other hand, item co-occurrence algorithms, such as the recently introduced word2vec-based recommenders, are typically left without an effective user representation. In this paper, we present a new approach to model users through recurrent neural networks by sequentially processing consumed items, represented by any type of embeddings and other context features. This way we obtain semantically rich user representations, which capture a user's musical taste over time. Our experimental analysis on large-scale user data shows that our model can be used to predict future songs a user will likely listen to, both in the short and long term.

AIAug 9, 2017
Decoupled Learning of Environment Characteristics for Safe Exploration

Pieter Van Molle, Tim Verbelen, Steven Bohez et al.

Reinforcement learning is a proven technique for an agent to learn a task. However, when learning a task using reinforcement learning, the agent cannot distinguish the characteristics of the environment from those of the task. This makes it harder to transfer skills between tasks in the same environment. Furthermore, this does not reduce risk when training for a new task. In this paper, we introduce an approach to decouple the environment characteristics from the task-specific ones, allowing an agent to develop a sense of survival. We evaluate our approach in an environment where an agent must learn a sequence of collection tasks, and show that decoupled learning allows for a safer utilization of prior knowledge.

ROMar 13, 2017
Sensor Fusion for Robot Control through Deep Reinforcement Learning

Steven Bohez, Tim Verbelen, Elias De Coninck et al.

Deep reinforcement learning is becoming increasingly popular for robot control algorithms, with the aim for a robot to self-learn useful feature representations from unstructured sensory input leading to the optimal actuation policy. In addition to sensors mounted on the robot, sensors might also be deployed in the environment, although these might need to be accessed via an unreliable wireless connection. In this paper, we demonstrate deep neural network architectures that are able to fuse information coming from multiple sensors and are robust to sensor failures at runtime. We evaluate our method on a search and pick task for a robot both in simulation and the real world.

IRJul 2, 2016
Representation learning for very short texts using weighted word embedding aggregation

Cedric De Boom, Steven Van Canneyt, Thomas Demeester et al.

Short text messages such as tweets are very noisy and sparse in their use of vocabulary. Traditional textual representations, such as tf-idf, have difficulty grasping the semantic meaning of such texts, which is important in applications such as event detection, opinion mining, news recommendation, etc. We constructed a method based on semantic word embeddings and frequency information to arrive at low-dimensional representations for short texts designed to capture semantic similarity. For this purpose we designed a weight-based model and a learning procedure based on a novel median-based loss function. This paper discusses the details of our model and the optimization methods, together with the experimental results on both Wikipedia and Twitter data. We find that our method outperforms the baseline approaches in the experiments, and that it generalizes well on different word embeddings without retraining. Our method is therefore capable of retaining most of the semantic information in the text, and is applicable out-of-the-box.

CVMay 27, 2016
Lazy Evaluation of Convolutional Filters

Sam Leroux, Steven Bohez, Cedric De Boom et al.

In this paper we propose a technique which avoids the evaluation of certain convolutional filters in a deep neural network. This allows to trade-off the accuracy of a deep neural network with the computational and memory requirements. This is especially important on a constrained device unable to hold all the weights of the network in memory.

NEMay 9, 2016
Efficiency Evaluation of Character-level RNN Training Schedules

Cedric De Boom, Sam Leroux, Steven Bohez et al.

We present four training and prediction schedules from the same character-level recurrent neural network. The efficiency of these schedules is tested in terms of model effectiveness as a function of training time and amount of training data seen. We show that the choice of training and prediction schedule potentially has a considerable impact on the prediction effectiveness for a given training budget.

IRDec 2, 2015
Learning Semantic Similarity for Very Short Texts

Cedric De Boom, Steven Van Canneyt, Steven Bohez et al.

Levering data on social media, such as Twitter and Facebook, requires information retrieval algorithms to become able to relate very short text fragments to each other. Traditional text similarity methods such as tf-idf cosine-similarity, based on word overlap, mostly fail to produce good results in this case, since word overlap is little or non-existent. Recently, distributed word representations, or word embeddings, have been shown to successfully allow words to match on the semantic level. In order to pair short text fragments - as a concatenation of separate words - an adequate distributed sentence representation is needed, in existing literature often obtained by naively combining the individual word representations. We therefore investigated several text representations as a combination of word embeddings in the context of semantic pair matching. This paper investigates the effectiveness of several such naive techniques, as well as traditional tf-idf similarity, for fragments of different lengths. Our main contribution is a first step towards a hybrid method that combines the strength of dense distributed representations - as opposed to sparse term matching - with the strength of tf-idf based methods to automatically reduce the impact of less informative terms. Our new approach outperforms the existing techniques in a toy experimental set-up, leading to the conclusion that the combination of word embeddings and tf-idf information might lead to a better model for semantic content within very short text fragments.