ROOct 4, 2022Code
Robotic Learning the Sequence of Packing Irregular Objects from Human DemonstrationsAndré Santos, Nuno Ferreira Duarte, Atabak Dehban et al.
We tackle the challenge of robotic bin packing with irregular objects, such as groceries. Given the diverse physical attributes of these objects and the complex constraints governing their placement and manipulation, employing preprogrammed strategies becomes unfeasible. Our approach is to learn directly from expert demonstrations in order to extract implicit task knowledge and strategies to ensure safe object positioning, efficient use of space, and the generation of human-like behaviors that enhance human-robot trust. We rely on human demonstrations to learn a Markov chain for predicting the object packing sequence for a given set of items and then compare it with human performance. Our experimental results show that the model outperforms human performance by generating sequence predictions that humans classify as human-like more frequently than human-generated sequences. The human demonstrations were collected using our proposed VR platform, BoxED, which is a box packaging environment for simulating real-world objects and scenarios for fast and streamlined data collection with the purpose of teaching robots. We collected data from 43 participants packing a total of 263 boxes with supermarket-like objects, yielding 4644 object manipulations. Our VR platform can be easily adapted to new scenarios and objects, and is publicly available, alongside our dataset, at https://github.com/andrejfsantos4/BoxED.
ROOct 22, 2025
GRASPLAT: Enabling dexterous grasping through novel view synthesisMatteo Bortolon, Nuno Ferreira Duarte, Plinio Moreno et al.
Achieving dexterous robotic grasping with multi-fingered hands remains a significant challenge. While existing methods rely on complete 3D scans to predict grasp poses, these approaches face limitations due to the difficulty of acquiring high-quality 3D data in real-world scenarios. In this paper, we introduce GRASPLAT, a novel grasping framework that leverages consistent 3D information while being trained solely on RGB images. Our key insight is that by synthesizing physically plausible images of a hand grasping an object, we can regress the corresponding hand joints for a successful grasp. To achieve this, we utilize 3D Gaussian Splatting to generate high-fidelity novel views of real hand-object interactions, enabling end-to-end training with RGB data. Unlike prior methods, our approach incorporates a photometric loss that refines grasp predictions by minimizing discrepancies between rendered and real images. We conduct extensive experiments on both synthetic and real-world grasping datasets, demonstrating that GRASPLAT improves grasp success rates up to 36.9% over existing image-based methods. Project page: https://mbortolon97.github.io/grasplat/
ROApr 23, 2025
HERB: Human-augmented Efficient Reinforcement learning for Bin-packingGojko Perovic, Nuno Ferreira Duarte, Atabak Dehban et al.
Packing objects efficiently is a fundamental problem in logistics, warehouse automation, and robotics. While traditional packing solutions focus on geometric optimization, packing irregular, 3D objects presents significant challenges due to variations in shape and stability. Reinforcement Learning~(RL) has gained popularity in robotic packing tasks, but training purely from simulation can be inefficient and computationally expensive. In this work, we propose HERB, a human-augmented RL framework for packing irregular objects. We first leverage human demonstrations to learn the best sequence of objects to pack, incorporating latent factors such as space optimization, stability, and object relationships that are difficult to model explicitly. Next, we train a placement algorithm that uses visual information to determine the optimal object positioning inside a packing container. Our approach is validated through extensive performance evaluations, analyzing both packing efficiency and latency. Finally, we demonstrate the real-world feasibility of our method on a robotic system. Experimental results show that our method outperforms geometric and purely RL-based approaches by leveraging human intuition, improving both packing robustness and adaptability. This work highlights the potential of combining human expertise-driven RL to tackle complex real-world packing challenges in robotic systems.
ROMay 10, 2019
Learning Motor Resonance in Human-Human and Human-Robot Interaction with Coupled Dynamical SystemNuno Ferreira Duarte, Mirko Raković, José Santos-Victor
Human interaction involves very sophisticated non-verbal communication skills like understanding the goals and actions of others and coordinating our own actions accordingly. Neuroscience refers to this mechanism as motor resonance, in the sense that the perception of another person's actions and sensory experiences activates the observer's brain as if (s)he would be performing the same actions and having the same experiences. We analyze and model non-verbal cues (arm movements) exchanged between two humans that interact and execute handover actions. The contributions of this paper are the following: (i) computational models, using recorded motion data, describing the motor behaviour of each actor in action-in-interaction situations, (ii) a computational model that captures the behaviour if the "giver" and "receiver" during an object handover action, by coupling the arm motion of both actors, and (iii) embedded these models in the iCub robot for both action execution and recognition. Our results show that: (i) the robot can interpret the human arm motion and recognize handover actions; and (ii) behave in a "human-like" manner to receive the object of the recognized handover action.
ROFeb 8, 2018
Action Anticipation: Reading the Intentions of Humans and RobotsNuno Ferreira Duarte, Jovica Tasevski, Moreno Coco et al.
Humans have the fascinating capacity of processing non-verbal visual cues to understand and anticipate the actions of other humans. This "intention reading" ability is underpinned by shared motor-repertoires and action-models, which we use to interpret the intentions of others as if they were our own. We investigate how the different cues contribute to the legibility of human actions during interpersonal interactions. Our first contribution is a publicly available dataset with recordings of human body-motion and eye-gaze, acquired in an experimental scenario with an actor interacting with three subjects. From these data, we conducted a human study to analyse the importance of the different non-verbal cues for action perception. As our second contribution, we used the motion/gaze recordings to build a computational model describing the interaction between two persons. As a third contribution, we embedded this model in the controller of an iCub humanoid robot and conducted a second human study, in the same scenario with the robot as an actor, to validate the model's "intention reading" capability. Our results show that it is possible to model (non-verbal) signals exchanged by humans during interaction, and how to incorporate such a mechanism in robotic systems with the twin goal of : (i) being able to "read" human action intentions, and (ii) acting in a way that is legible by humans.