Simon Brodeur

6papers

124citations

Novelty33%

AI Score22

Ranked #189,258 of 205,806 authors (top 92%)#30,800 in CL (top 95%)

6 Papers

AINov 29, 2017Code

HoME: a Household Multimodal Environment

Simon Brodeur, Ethan Perez, Ankesh Anand et al.

We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more. We hope HoME better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting.

CLMar 30, 2020

AriEL: volume coding for sentence generation

Luca Celotti, Simon Brodeur, Jean Rouat

Mapping sequences of discrete data to a point in a continuous space makes it difficult to retrieve those sequences via random sampling. Mapping the input to a volume would make it easier to retrieve at test time, and that's the strategy followed by the family of approaches based on Variational Autoencoder. However the fact that they are at the same time optimizing for prediction and for smoothness of representation, forces them to trade-off between the two. We improve on the performance of some of the standard methods in deep learning to generate sentences by uniformly sampling a continuous space. We do it by proposing AriEL, that constructs volumes in a continuous space, without the need of encouraging the creation of volumes through the loss function. We first benchmark on a toy grammar, that allows to automatically evaluate the language learned and generated by the models. Then, we benchmark on a real dataset of human dialogues. Our results indicate that the random access to the stored information is dramatically improved, and our method AriEL is able to generate a wider variety of correct language by randomly sampling the latent space. VAE follows in performance for the toy dataset while, AE and Transformer follow for the real dataset. This partially supports to the hypothesis that encoding information into volumes instead of into points, can lead to improved retrieval of learned information with random sampling. This can lead to better generators and we also discuss potential disadvantages.

CLNov 5, 2019

Language coverage and generalization in RNN-based continuous sentence embeddings for interacting agents

Luca Celotti, Simon Brodeur, Jean Rouat

Continuous sentence embeddings using recurrent neural networks (RNNs), where variable-length sentences are encoded into fixed-dimensional vectors, are often the main building blocks of architectures applied to language tasks such as dialogue generation. While it is known that those embeddings are able to learn some structures of language (e.g. grammar) in a purely data-driven manner, there is very little work on the objective evaluation of their ability to cover the whole language space and to generalize to sentences outside the language bias of the training data. Using a manually designed context-free grammar (CFG) to generate a large-scale dataset of sentences related to the content of realistic 3D indoor scenes, we evaluate the language coverage and generalization abilities of the most common continuous sentence embeddings based on RNNs. We also propose a new embedding method based on arithmetic coding, AriEL, that is not data-driven and that efficiently encodes in continuous space any sentence from the CFG. We find that RNN-based embeddings underfit the training data and cover only a small subset of the language defined by the CFG. They also fail to learn the underlying CFG and generalize to unbiased sentences from that same CFG. We found that AriEL provides an insightful baseline.

SPApr 27, 2018

Classification of auditory stimuli from EEG signals with a regulated recurrent neural network reservoir

Marc-Antoine Moinnereau, Thomas Brienne, Simon Brodeur et al.

The use of electroencephalogram (EEG) as the main input signal in brain-machine interfaces has been widely proposed due to the non-invasive nature of the EEG. Here we are specifically interested in interfaces that extract information from the auditory system and more specifically in the task of classifying heard speech from EEGs. To do so, we propose to limit the preprocessing of the EEGs and use machine learning approaches to automatically extract their meaningful characteristics. More specifically, we use a regulated recurrent neural network (RNN) reservoir, which has been shown to outperform classic machine learning approaches when applied to several different bio-signals, and we compare it with a deep neural network approach. Moreover, we also investigate the classification performance as a function of the number of EEG electrodes. A set of 8 subjects were presented randomly with 3 different auditory stimuli (English vowels a, i and u). We obtained an excellent classification rate of 83.2% with the RNN when considering all 64 electrodes. A rate of 81.7% was achieved with only 10 electrodes.

ROJan 30, 2018

CREATE: Multimodal Dataset for Unsupervised Learning, Generative Modeling and Prediction of Sensory Data from a Mobile Robot in Indoor Environments

Simon Brodeur, Simon Carrier, Jean Rouat

The CREATE database is composed of 14 hours of multimodal recordings from a mobile robotic platform based on the iRobot Create. The various sensors cover vision, audition, motors and proprioception. The dataset has been designed in the context of a mobile robot that can learn multimodal representations of its environment, thanks to its ability to navigate the environment. This ability can also be used to learn the dependencies and relationships between the different modalities of the robot (e.g. vision, audition), as they reflect both the external environment and the internal state of the robot. The provided multimodal dataset is expected to have multiple usages, such as multimodal unsupervised object learning, multimodal prediction and egomotion/causality detection.

SDNov 22, 2013

Objets Sonores: Une Représentation Bio-Inspirée Hiérarchique Parcimonieuse À Très Grandes Dimensions Utilisable En Reconnaissance; Auditory Objects: Bio-Inspired Hierarchical Sparse High Dimensional Representation for Recognition

Simon Brodeur, Jean Rouat

L'accent est placé dans cet article sur la structure hiérarchique, l'aspect parcimonieux de la représentation de l'information sonore, la très grande dimension des caractéristiques ainsi que sur l'indépendance des caractéristiques permettant de définir les composantes des objets sonores. Les notions d'objet sonore et de représentation neuronale sont d'abord introduites, puis illustrées avec une application en analyse de signaux sonores variés: parole, musique et environnements naturels extérieurs. Finalement, un nouveau système de reconnaissance automatique de parole est proposé. Celui-ci est comparé à un système statistique conventionnel. Il montre très clairement que l'analyse par objets sonores introduit une grande polyvalence et robustesse en reconnaissance de parole. Cette intégration des connaissances en neurosciences et traitement des signaux acoustiques ouvre de nouvelles perspectives dans le domaine de la reconnaissance de signaux acoustiques. The emphasis is put on the hierarchical structure, independence and sparseness aspects of auditory signal representations in high-dimensional spaces, so as to define the components of auditory objects. The concept of an auditory object and its neural representation is introduced. An illustrative application then follows, consisting in the analysis of various auditory signals: speech, music and natural outdoor environments. A new automatic speech recognition (ASR) system is then proposed and compared to a conventional statistical system. The proposed system clearly shows that an object-based analysis introduces a great flexibility and robustness for the task of speech recognition. The integration of knowledge from neuroscience and acoustic signal processing brings new ways of thinking to the field of classification of acoustic signals.