Guillem Alenyà

h-index35

14papers

942citations

Novelty55%

AI Score51

Ranked #18,167 of 194,257 authors (top 9%)#448 in RO (top 7%)

14 Papers

19.6CVMay 12, 2022

Learned Vertex Descent: A New Direction for 3D Human Model Fitting

Enric Corona, Gerard Pons-Moll, Guillem Alenyà et al.

We propose a novel optimization-based paradigm for 3D human model fitting on images and scans. In contrast to existing approaches that directly regress the parameters of a low-dimensional statistical body model (e.g. SMPL) from input images, we train an ensemble of per-vertex neural fields network. The network predicts, in a distributed manner, the vertex descent direction towards the ground truth, based on neural features extracted at the current vertex projection. At inference, we employ this network, dubbed LVD, within a gradient-descent optimization pipeline until its convergence, which typically occurs in a fraction of a second even when initializing all vertices into a single point. An exhaustive evaluation demonstrates that our approach is able to capture the underlying body of clothed people with very different body shapes, achieving a significant improvement compared to state-of-the-art. LVD is also applicable to 3D model fitting of humans and hands, for which we show a significant improvement to the SOTA with a much simpler and faster method.

6.1ROMay 20

Temporal Counterfactual Explanations of Behaviour Tree Decisions

Tamlin Love, Antonio Andriella, Guillem Alenyà

Explainability, in particular, the ability for robots to explain why they have made a decision or behaved in a certain way, is a critical tool in helping users understand the robots they interact and coexist with. Behaviour trees are a popular framework for controlling the decision-making of robots, and thus a natural question to ask is whether or not a system driven by a behaviour tree is capable of answering "why" questions. While explainability for behaviour tree-driven robots has seen some prior attention, no existing methods are capable of generating causal, counterfactual explanations which detail the reasons for robot decisions and behaviour. Therefore, in this work, we introduce a novel approach which automatically generates counterfactual explanations in response to contrastive "why" questions. Our method achieves this by first automatically building a causal model from the structure of the behaviour tree as well as domain knowledge about the state and individual behaviour tree nodes. The resultant causal model is then queried and searched to find a set of diverse counterfactual explanations. We demonstrate that our approach is able to correctly explain the behaviour of a wide range of behaviour tree structures and states in real time, unlike previous methods which are either unable to answer contrastive questions with causal explanations, or are not guaranteed to provide consistent and accurate explanations. By being able to answer a wide range of causal queries, our approach represents a step towards more transparent, understandable, and ultimately safe and trustworthy robotic systems.

3.2RONov 5, 2025

Multi-User Personalisation in Human-Robot Interaction: Using Quantitative Bipolar Argumentation Frameworks for Preferences Conflict Resolution

Aniol Civit, Antonio Andriella, Carles Sierra et al.

While personalisation in Human-Robot Interaction (HRI) has advanced significantly, most existing approaches focus on single-user adaptation, overlooking scenarios involving multiple stakeholders with potentially conflicting preferences. To address this, we propose the Multi-User Preferences Quantitative Bipolar Argumentation Framework (MUP-QBAF), a novel multi-user personalisation framework based on Quantitative Bipolar Argumentation Frameworks (QBAFs) that explicitly models and resolves multi-user preference conflicts. Unlike prior work in Argumentation Frameworks, which typically assumes static inputs, our approach is tailored to robotics: it incorporates both users' arguments and the robot's dynamic observations of the environment, allowing the system to adapt over time and respond to changing contexts. Preferences, both positive and negative, are represented as arguments whose strength is recalculated iteratively based on new information. The framework's properties and capabilities are presented and validated through a realistic case study, where an assistive robot mediates between the conflicting preferences of a caregiver and a care recipient during a frailty assessment task. This evaluation further includes a sensitivity analysis of argument base scores, demonstrating how preference outcomes can be shaped by user input and contextual observations. By offering a transparent, structured, and context-sensitive approach to resolving competing user preferences, this work advances the field of multi-user HRI. It provides a principled alternative to data-driven methods, enabling robots to navigate conflicts in real-world environments.

4.0ROMar 22, 2022Code

Semantic State Estimation in Cloth Manipulation Tasks

Georgies Tzelepis, Eren Erdal Aksoy, Júlia Borràs et al.

Understanding of deformable object manipulations such as textiles is a challenge due to the complexity and high dimensionality of the problem. Particularly, the lack of a generic representation of semantic states (e.g., \textit{crumpled}, \textit{diagonally folded}) during a continuous manipulation process introduces an obstacle to identify the manipulation type. In this paper, we aim to solve the problem of semantic state estimation in cloth manipulation tasks. For this purpose, we introduce a new large-scale fully-annotated RGB image dataset showing various human demonstrations of different complicated cloth manipulations. We provide a set of baseline deep networks and benchmark them on the problem of semantic state estimation using our proposed dataset. Furthermore, we investigate the scalability of our semantic state estimation framework in robot monitoring tasks of long and complex cloth manipulations.

10.4RONov 2, 2021

Household Cloth Object Set: Fostering Benchmarking in Deformable Object Manipulation

Irene Garcia-Camacho, Júlia Borràs, Berk Calli et al.

Benchmarking of robotic manipulations is one of the open issues in robotic research. An important factor that has enabled progress in this area in the last decade is the existence of common object sets that have been shared among different research groups. However, the existing object sets are very limited when it comes to cloth-like objects that have unique particularities and challenges. This paper is a first step towards the design of a cloth object set to be distributed among research groups from the robotics cloth manipulation community. We present a set of household cloth objects and related tasks that serve to expose the challenges related to gathering such an object set and propose a roadmap to the design of common benchmarks in cloth manipulation tasks, with the intention to set the grounds for a future debate in the community that will be necessary to foster benchmarking for the manipulation of cloth-like objects. Some RGB-D and object scans are also collected as examples for the objects in relevant configurations. More details about the cloth set are shared in http://www.iri.upc.edu/groups/perception/ClothObjectSet/HouseholdClothSet.html.

30.6CVMar 11, 2021Code

SMPLicit: Topology-aware Generative Model for Clothed People

Enric Corona, Albert Pumarola, Guillem Alenyà et al.

In this paper we introduce SMPLicit, a novel generative model to jointly represent body pose, shape and clothing geometry. In contrast to existing learning-based approaches that require training specific models for each type of garment, SMPLicit can represent in a unified manner different garment topologies (e.g. from sleeveless tops to hoodies and to open jackets), while controlling other properties like the garment size or tightness/looseness. We show our model to be applicable to a large variety of garments including T-shirts, hoodies, jackets, shorts, pants, skirts, shoes and even hair. The representation flexibility of SMPLicit builds upon an implicit model conditioned with the SMPL human body parameters and a learnable latent space which is semantically interpretable and aligned with the clothing attributes. The proposed model is fully differentiable, allowing for its use into larger end-to-end trainable systems. In the experimental section, we demonstrate SMPLicit can be readily used for fitting 3D scans and for 3D reconstruction in images of dressed people. In both cases we are able to go beyond state of the art, by retrieving complex garment geometries, handling situations with multiple clothing layers and providing a tool for easy outfit editing. To stimulate further research in this direction, we will make our code and model publicly available at http://www.iri.upc.edu/people/ecorona/smplicit/.

7.1AIDec 14, 2020Code

Online Action Recognition

Alejandro Suárez-Hernández, Javier Segovia-Aguas, Carme Torras et al.

Recognition in planning seeks to find agent intentions, goals or activities given a set of observations and a knowledge library (e.g. goal states, plans or domain theories). In this work we introduce the problem of Online Action Recognition. It consists in recognizing, in an open world, the planning action that best explains a partially observable state transition from a knowledge library of first-order STRIPS actions, which is initially empty. We frame this as an optimization problem, and propose two algorithms to address it: Action Unification (AU) and Online Action Recognition through Unification (OARU). The former builds on logic unification and generalizes two input actions using weighted partial MaxSAT. The latter looks for an action within the library that explains an observed transition. If there is such action, it generalizes it making use of AU, building in this way an AU hierarchy. Otherwise, OARU inserts a Trivial Grounded Action (TGA) in the library that explains just that transition. We report results on benchmarks from the International Planning Competition and PDDLGym, where OARU recognizes actions accurately with respect to expert knowledge, and shows real-time performance.

8.3ROSep 30, 2020

Encoding cloth manipulations using a graph of states and transitions

Júlia Borràs, Guillem Alenyà, Carme Torras

Cloth manipulation is very relevant for domestic robotic tasks, but it presents many challenges due to the complexity of representing, recognizing and predicting the behaviour of cloth under manipulation. In this work, we propose a generic, compact and simplified representation of the states of cloth manipulation that allows for representing tasks as sequences of states and transitions. We also define a Cloth Manipulation Graph that encodes all the strategies to accomplish a task. Our novel representation is used to encode two different cloth manipulation tasks, learned from an experiment with human subjects with video and motion data. We show how our simplified representation allows to obtain a map of meaningful motion primitives.

4.1ROSep 18, 2020

Leveraging Multiple Environments for Learning and Decision Making: a Dismantling Use Case

Alejandro Suárez-Hernández, Thierry Gaugry, Javier Segovia-Aguas et al.

Learning is usually performed by observing real robot executions. Physics-based simulators are a good alternative for providing highly valuable information while avoiding costly and potentially destructive robot executions. We present a novel approach for learning the probabilities of symbolic robot action outcomes. This is done leveraging different environments, such as physics-based simulators, in execution time. To this end, we propose MENID (Multiple Environment Noise Indeterministic Deictic) rules, a novel representation able to cope with the inherent uncertainties present in robotic tasks. MENID rules explicitly represent each possible outcomes of an action, keep memory of the source of the experience, and maintain the probability of success of each outcome. We also introduce an algorithm to distribute actions among environments, based on previous experiences and expected gain. Before using physics-based simulations, we propose a methodology for evaluating different simulation settings and determining the least time-consuming model that could be used while still producing coherent results. We demonstrate the validity of the approach in a dismantling use case, using a simulation with reduced quality as simulated system, and a simulation with full resolution where we add noise to the trajectories and some physical parameters as a representation of the real system.

28.6LGJul 8, 2020Code

Self-Supervised Policy Adaptation during Deployment

Nicklas Hansen, Rishabh Jangir, Yu Sun et al.

In most real world scenarios, a policy trained by reinforcement learning in one environment needs to be deployed in another, potentially quite different environment. However, generalization across different environments is known to be hard. A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal. Our work explores the use of self-supervision to allow the policy to continue training after deployment without using any rewards. While previous methods explicitly anticipate changes in the new environment, we assume no prior knowledge of those changes yet still obtain significant improvements. Empirical evaluations are performed on diverse simulation environments from DeepMind Control suite and ViZDoom, as well as real robotic manipulation tasks in continuously changing environments, taking observations from an uncalibrated camera. Our method improves generalization in 31 out of 36 environments across various tasks and outperforms domain randomization on a majority of environments.

9.5AIJan 30, 2020

STRIPS Action Discovery

Alejandro Suárez-Hernández, Javier Segovia-Aguas, Carme Torras et al.

The problem of specifying high-level knowledge bases for planning becomes a hard task in realistic environments. This knowledge is usually handcrafted and is hard to keep updated, even for system experts. Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing. These approaches can synthesize action schemas in Planning Domain Definition Language (PDDL) from a set of execution traces each consisting, at least, of an initial and final state. In this paper, we propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown. In addition, we contribute with a compilation to classical planning that mitigates the problem of learning static predicates in the action model preconditions, exploits the capabilities of SAT planners with parallel encodings to compute action schemas and validate all instances. Our system is flexible in that it supports the inclusion of partial input information that may speed up the search. We show through several experiments how learned action models generalize over unseen planning instances.

22.9ROOct 31, 2019

Dynamic Cloth Manipulation with Deep Reinforcement Learning

Rishabh Jangir, Guillem Alenya, Carme Torras

In this paper we present a Deep Reinforcement Learning approach to solve dynamic cloth manipulation tasks. Differing from the case of rigid objects, we stress that the followed trajectory (including speed and acceleration) has a decisive influence on the final state of cloth, which can greatly vary even if the positions reached by the grasped points are the same. We explore how goal positions for non-grasped points can be attained through learning adequate trajectories for the grasped points. Our approach uses few demonstrations to improve control policy learning, and a sparse reward approach to avoid engineering complex reward functions. Since perception of textiles is challenging, we also study different state representations to assess the minimum observation space required for learning to succeed. Finally, we compare different combinations of control policy encodings, demonstrations, and sparse reward learning techniques, and show that our proposed approach can learn dynamic cloth manipulation in an efficient way, i.e., using a reduced observation space, a few demonstrations, and a sparse reward.

20.3ROJun 19, 2019

A Grasping-centered Analysis for Cloth Manipulation

Júlia Borràs, Guillem Alenya, Carme Torras

Compliant and soft hands have gained a lot of attention in the past decade because of their ability to adapt to the shape of the objects, increasing their effectiveness for grasping. However, when it comes to grasping highly flexible objects such as textiles, we face the dual problem: it is the object that will adapt to the shape of the hand or gripper. In this context, the classic grasp analysis or grasping taxonomies are not suitable for describing textile objects grasps. This work proposes a novel definition of textile object grasps that abstracts from the robotic embodiment or hand shape and recovers concepts from the early neuroscience literature on hand prehension skills. This framework enables us to identify what grasps have been used in literature until now to perform robotic cloth manipulation, and allows for a precise definition of all the tasks that have been tackled in terms of manipulation primitives based on regrasps. In addition, we also review what grippers have been used. Our analysis shows how the vast majority of cloth manipulations have relied only on one type of grasp, and at the same time we identify several tasks that need more variety of grasp types to be executed successfully. Our framework is generic, provides a classification of cloth manipulation primitives and can inspire gripper design and benchmark construction for cloth manipulation.

22.5CVApr 6, 2019

Context-aware Human Motion Prediction

Enric Corona, Albert Pumarola, Guillem Alenyà et al.

The problem of predicting human motion given a sequence of past observations is at the core of many applications in robotics and computer vision. Current state-of-the-art formulate this problem as a sequence-to-sequence task, in which a historical of 3D skeletons feeds a Recurrent Neural Network (RNN) that predicts future movements, typically in the order of 1 to 2 seconds. However, one aspect that has been obviated so far, is the fact that human motion is inherently driven by interactions with objects and/or other humans in the environment. In this paper, we explore this scenario using a novel context-aware motion prediction architecture. We use a semantic-graph model where the nodes parameterize the human and objects in the scene and the edges their mutual interactions. These interactions are iteratively learned through a graph attention layer, fed with the past observations, which now include both object and human body motions. Once this semantic graph is learned, we inject it to a standard RNN to predict future movements of the human/s and object/s. We consider two variants of our architecture, either freezing the contextual interactions in the future of updating them. A thorough evaluation in the "Whole-Body Human Motion Database" shows that in both cases, our context-aware networks clearly outperform baselines in which the context information is not considered.