Ana Paiva

RO
h-index6
17papers
223citations
Novelty46%
AI Score36

17 Papers

ROOct 20, 2022
From Modelling to Understanding Children's Behaviour in the Context of Robotics and Social Artificial Intelligence

Serge Thill, Vicky Charisi, Tony Belpaeme et al.

Understanding and modelling children's cognitive processes and their behaviour in the context of their interaction with robots and social artificial intelligence systems is a fundamental prerequisite for meaningful and effective robot interventions. However, children's development involve complex faculties such as exploration, creativity and curiosity which are challenging to model. Also, often children express themselves in a playful way which is different from a typical adult behaviour. Different children also have different needs, and it remains a challenge in the current state of the art that those of neurodiverse children are under-addressed. With this workshop, we aim to promote a common ground among different disciplines such as developmental sciences, artificial intelligence and social robotics and discuss cutting-edge research in the area of user modelling and adaptive systems for children.

ROSep 19, 2022
"Guess what I'm doing": Extending legibility to sequential decision tasks

Miguel Faria, Francisco S. Melo, Ana Paiva

In this paper we investigate the notion of legibility in sequential decision tasks under uncertainty. Previous works that extend legibility to scenarios beyond robot motion either focus on deterministic settings or are computationally too expensive. Our proposed approach, dubbed PoL-MDP, is able to handle uncertainty while remaining computationally tractable. We establish the advantages of our approach against state-of-the-art approaches in several simulated scenarios of different complexity. We also showcase the use of our legible policies as demonstrations for an inverse reinforcement learning agent, establishing their superiority against the commonly used demonstrations based on the optimal policy. Finally, we assess the legibility of our computed policies through a user study where people are asked to infer the goal of a mobile robot following a legible policy by observing its actions.

LGOct 12, 2022
Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning

Pedro P. Santos, Diogo S. Carvalho, Miguel Vasco et al.

We introduce hybrid execution in multi-agent reinforcement learning (MARL), a new paradigm in which agents aim to successfully complete cooperative tasks with arbitrary communication levels at execution time by taking advantage of information-sharing among the agents. Under hybrid execution, the communication level can range from a setting in which no communication is allowed between agents (fully decentralized), to a setting featuring full communication (fully centralized), but the agents do not know beforehand which communication level they will encounter at execution time. To formalize our setting, we define a new class of multi-agent partially observable Markov decision processes (POMDPs) that we name hybrid-POMDPs, which explicitly model a communication process between the agents. We contribute MARO, an approach that makes use of an auto-regressive predictive model, trained in a centralized manner, to estimate missing agents' observations at execution time. We evaluate MARO on standard scenarios and extensions of previous benchmarks tailored to emphasize the negative impact of partial observability in MARL. Experimental results show that our method consistently outperforms relevant baselines, allowing agents to act with faulty communication while successfully exploiting shared information.

ROMar 2, 2022
Avant-Satie! Using ERIK to encode task-relevant expressivity into the animation of autonomous social robots

Tiago Ribeiro, Ana Paiva

ERIK is an expressive inverse kinematics technique that has been previously presented and evaluated both algorithmically and in a limited user-interaction scenario. It allows autonomous social robots to convey posture-based expressive information while gaze-tracking users. We have developed a new scenario aimed at further validating some of the unsupported claims from the previous scenario. Our experiment features a fully autonomous Adelino robot, and concludes that ERIK can be used to direct a user's choice of actions during execution of a given task, fully through its non-verbal expressive queues.

MAMar 4, 2021Code
FAtiMA Toolkit -- Toward an effective and accessible tool for the development of intelligent virtual agents and social robots

Samuel Mascarenhas, Manuel Guimarães, Pedro A. Santos et al.

More than a decade has passed since the development of FearNot!, an application designed to help children deal with bullying through role-playing with virtual characters. It was also the application that led to the creation of FAtiMA, an affective agent architecture for creating autonomous characters that can evoke empathic responses. In this paper, we describe FAtiMA Toolkit, a collection of open-source tools that is designed to help researchers, game developers and roboticists incorporate a computational model of emotion and decision-making in their work. The toolkit was developed with the goal of making FAtiMA more accessible, easier to incorporate into different projects and more flexible in its capabilities for human-agent interaction, based upon the experience gathered over the years across different virtual environments and human-robot interaction scenarios. As a result, this work makes several different contributions to the field of Agent-Based Architectures. More precisely, FAtiMA Toolkit's library based design allows developers to easily integrate it with other frameworks, its meta-cognitive model affords different internal reasoners and affective components and its explicit dialogue structure gives control to the author even within highly complex scenarios. To demonstrate the use of FAtiMA Toolkit, several different use cases where the toolkit was successfully applied are described and discussed.

AIJul 29, 2025
"Teammates, Am I Clear?": Analysing Legible Behaviours in Teams

Miguel Faria, Francisco S. Melo, Ana Paiva

In this paper we investigate the notion of legibility in sequential decision-making in the context of teams and teamwork. There have been works that extend the notion of legibility to sequential decision making, for deterministic and for stochastic scenarios. However, these works focus on one agent interacting with one human, foregoing the benefits of having legible decision making in teams of agents or in team configurations with humans. In this work we propose an extension of legible decision-making to multi-agent settings that improves the performance of agents working in collaboration. We showcase the performance of legible decision making in team scenarios using our proposed extension in multi-agent benchmark scenarios. We show that a team with a legible agent is able to outperform a team composed solely of agents with standard optimal behaviour.

LGFeb 7, 2022
Geometric Multimodal Contrastive Representation Learning

Petra Poklukar, Miguel Vasco, Hang Yin et al.

Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method consisting of two main components: i) a two-level architecture consisting of modality-specific base encoders, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.

LGOct 7, 2021
How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents

Miguel Vasco, Hang Yin, Francisco S. Melo et al.

This work addresses the problem of sensing the world: how to learn a multimodal representation of a reinforcement learning agent's environment that allows the execution of tasks under incomplete perceptual conditions. To address such problem, we argue for hierarchy in the design of representation models and contribute with a novel multimodal representation model, MUSE. The proposed model learns hierarchical representations: low-level modality-specific representations, encoded from raw observation data, and a high-level multimodal representation, encoding joint-modality information to allow robust state estimation. We employ MUSE as the sensory representation model of deep reinforcement learning agents provided with multimodal observations in Atari games. We perform a comparative study over different designs of reinforcement learning agents, showing that MUSE allows agents to perform tasks under incomplete perceptual experience with minimal performance loss. Finally, we evaluate the performance of MUSE in literature-standard multimodal scenarios with higher number and more complex modalities, showing that it outperforms state-of-the-art multimodal variational autoencoders in single and cross-modality generation.

LGJun 4, 2020
MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning

Miguel Vasco, Francisco S. Melo, Ana Paiva

Humans are able to create rich representations of their external reality. Their internal representations allow for cross-modality inference, where available perceptions can induce the perceptual experience of missing input modalities. In this paper, we contribute the Multimodal Hierarchical Variational Auto-encoder (MHVAE), a hierarchical multimodal generative model for representation learning. Inspired by human cognitive models, the MHVAE is able to learn modality-specific distributions, of an arbitrary number of modalities, and a joint-modality distribution, responsible for cross-modality inference. We formally derive the model's evidence lower bound and propose a novel methodology to approximate the joint-modality posterior based on modality-specific representation dropout. We evaluate the MHVAE on standard multimodal datasets. Our model performs on par with other state-of-the-art generative models regarding joint-modality reconstruction from arbitrary input modalities and cross-modality inference.

ROMar 11, 2020
Explainable Agents Through Social Cues: A Review

Sebastian Wallkotter, Silvia Tulli, Ginevra Castellano et al.

The issue of how to make embodied agents explainable has experienced a surge of interest over the last three years, and, there are many terms that refer to this concept, e.g., transparency or legibility. One reason for this high variance in terminology is the unique array of social cues that embodied agents can access in contrast to that accessed by non-embodied agents. Another reason is that different authors use these terms in different ways. Hence, we review the existing literature on explainability and organize it by (1) providing an overview of existing definitions, (2) showing how explainability is implemented and how it exploits different social cues, and (3) showing how the impact of explainability is measured. Additionally, we present a list of open questions and challenges that highlight areas that require further investigation by the community. This provides the interested reader with an overview of the current state-of-the-art.

AINov 28, 2019
Playing Games in the Dark: An approach for cross-modality transfer in reinforcement learning

Rui Silva, Miguel Vasco, Francisco S. Melo et al.

In this work we explore the use of latent representations obtained from multiple input sensory modalities (such as images or sounds) in allowing an agent to learn and exploit policies over different subsets of input modalities. We propose a three-stage architecture that allows a reinforcement learning agent trained over a given sensory modality, to execute its task on a different sensory modality-for example, learning a visual policy over image inputs, and then execute such policy when only sound inputs are available. We show that the generalized policies achieve better out-of-the-box performance when compared to different baselines. Moreover, we show this holds in different OpenAI gym and video game environments, even when using different multimodal generative models and reinforcement learning algorithms.

ROSep 30, 2019
Expressive Inverse Kinematics Solving in Real-time for Virtual and Robotic Interactive Characters

Tiago Ribeiro, Ana Paiva

With new advancements in interaction techniques, character animation also requires new methods, to support fields such as robotics, and VR/AR. Interactive characters in such fields are becoming driven by AI which opens up the possibility of non-linear and open-ended narratives that may even include interaction with the real, physical world. This paper presents and describes ERIK, an expressive inverse kinematics technique aimed at such applications. Our technique allows an arbitrary kinematic chain, such as an arm, snake, or robotic manipulator, to exhibit an expressive posture while aiming its end-point towards a given target orientation. The technique runs in interactive-time and does not require any pre-processing step such as e.g. training in machine learning techniques, in order to support new embodiments or new postures. That allows it to be integrated in an artist-friendly workflow, bringing artists closer to the development of such AI-driven expressive characters, by allowing them to use their typical animation tools of choice, and to properly pre-visualize the animation during design-time, even on a real robot. The full algorithmic specification is presented and described so that it can be implemented and used throughout the communities of the various fields we address. We demonstrate ERIK on different virtual kinematic structures, and also on a low-fidelity robot that was crafted using wood and hobby-grade servos, to show how well the technique performs even on a low-grade robot. Our evaluation shows how well the technique performs, i.e., how well the character is able to point at the target orientation, while minimally disrupting its target expressive posture, and respecting its mechanical rotation limits.

ROSep 24, 2019
Software architecture for YOLO, a creativity-stimulating robot

Patrícia Alves-Oliveira, Samuel Gomes, Ankita Chandak et al.

YOLO is a social robot designed and developed to stimulate creativity in children through storytelling activities. Children use it as a character in their stories. This article details the artificial intelligence software developed for YOLO. The implemented software schedules through several Creativity Behaviors to find the ones that stimulate creativity more effectively. YOLO can choose between convergent and divergent thinking techniques, two important processes of creative thought. These techniques were developed based on the psychological theories of creativity development and on research from creativity experts who work with children. Additionally, this software allows the creation of Social Behaviors that enable the robot to behave as a believable character. On top of our framework, we built 3 main social behavior parameters: Exuberant, Aloof, and Harmonious. These behaviors are meant to ease immersive play and the process of character creation. The 3 social behaviors were based on psychological theories of personality and developed using children's input during co-design studies. Overall, this work presents an attempt to design, develop, and deploy social robots that nurture intrinsic human abilities, such as the ability to be creative.

ROApr 5, 2019
Nutty-based Robot Animation -- Principles and Practices

Tiago Ribeiro, Ana Paiva

Robot animation is a new form of character animation that extends the traditional process by allowing the animated motion to become more interactive and adaptable during interaction with users in real-world settings. This paper reviews how this new type of character animation has evolved and been shaped from character animation principles and practices. We outline some new paradigms that aim at allowing character animators to become robot animators, and to properly take part in the development of social robots. One such paradigm consists of the 12 principles of robot animation, which describes general concepts that both animators and robot developers should consider in order to properly understand each other. We also introduce the concept of Kinematronics, for specifying the controllable and programmable expressive abilities of robots, and the Nutty Workflow and Pipeline. The Nutty Pipeline introduces the concept of the Programmable Robot Animation Engine, which allows to generate, compose and blend various types of animation sources into a final, interaction-enabled motion that can be rendered on robots in real-time during real-world interactions. The Nutty Motion Filter is described and exemplified as a technique that allows an open-loop motion controller to apply physical limits to the motion while still allowing to tweak the shape and expressivity of the resulting motion. Additionally, we describe some types of tools that can be developed and integrated into Nutty-based workflows and pipelines, which allow animation artists to perform an integral part of the expressive behaviour development within social robots, and thus to evolve from standard (3D) character animators, towards a full-stack type of robot animators.

CVMar 6, 2019
Learning multimodal representations for sample-efficient recognition of human actions

Miguel Vasco, Francisco S. Melo, David Martins de Matos et al.

Humans interact in rich and diverse ways with the environment. However, the representation of such behavior by artificial agents is often limited. In this work we present \textit{motion concepts}, a novel multimodal representation of human actions in a household environment. A motion concept encompasses a probabilistic description of the kinematics of the action along with its contextual background, namely the location and the objects held during the performance. Furthermore, we present Online Motion Concept Learning (OMCL), a new algorithm which learns novel motion concepts from action demonstrations and recognizes previously learned motion concepts. The algorithm is evaluated on a virtual-reality household environment with the presence of a human avatar. OMCL outperforms standard motion recognition algorithms on an one-shot recognition task, attesting to its potential for sample-efficient recognition of human actions.

HCFeb 5, 2019
Empathic Robot for Group Learning: A Field Study

Patricia Alves-Oliveira, Pedro Sequeira, Francisco S. Melo et al.

This work explores a group learning scenario with an autonomous empathic robot. We address two research questions: (1) Can an autonomous robot designed with empathic competencies foster collaborative learning in a group context? (2) Can an empathic robot sustain positive educational outcomes in long-term collaborative learning interactions with groups of students? To answer these questions, we developed an autonomous robot with empathic competencies that is able to interact with a group of students in a learning activity about sustainable development. Two studies were conducted. The first study compares learning outcomes in children across 3 conditions: learning with an empathic robot; learning with a robot without empathic capabilities; and learning without a robot. The results show that the autonomous robot with empathy fosters meaningful discussions about sustainability, which is a learning outcome in sustainability education. The second study features groups of students who interact with the robot in a school classroom for two months. The long-term educational interaction did not seem to provide significant learning gains, although there was a change in game-actions to achieve more sustainability during game-play. This result reflects the need to perform more long-term research in the field of educational robots for group learning.

ROFeb 22, 2016
Cognitive Architecture for Mutual Modelling

Alexis Jacq, Wafa Johal, Pierre Dillenbourg et al.

In social robotics, robots needs to be able to be understood by humans. Especially in collaborative tasks where they have to share mutual knowledge. For instance, in an educative scenario, learners share their knowledge and they must adapt their behaviour in order to make sure they are understood by others. Learners display behaviours in order to show their understanding and teachers adapt in order to make sure that the learners' knowledge is the required one. This ability requires a model of their own mental states perceived by others: \textit{"has the human understood that I(robot) need this object for the task or should I explain it once again ?"} In this paper, we discuss the importance of a cognitive architecture enabling second-order Mutual Modelling for Human-Robot Interaction in educative contexts.