CVJul 15, 2022
Human keypoint detection for close proximity human-robot interactionJan Docekal, Jakub Rozlivek, Jiri Matas et al.
We study the performance of state-of-the-art human keypoint detectors in the context of close proximity human-robot interaction. The detection in this scenario is specific in that only a subset of body parts such as hands and torso are in the field of view. In particular, (i) we survey existing datasets with human pose annotation from the perspective of close proximity images and prepare and make publicly available a new Human in Close Proximity (HiCP) dataset; (ii) we quantitatively and qualitatively compare state-of-the-art human whole-body 2D keypoint detection methods (OpenPose, MMPose, AlphaPose, Detectron2) on this dataset; (iii) since accurate detection of hands and fingers is critical in applications with handovers, we evaluate the performance of the MediaPipe hand detector; (iv) we deploy the algorithms on a humanoid robot with an RGB-D camera on its head and evaluate the performance in 3D human keypoint detection. A motion capture system is used as reference. The best performing whole-body keypoint detectors in close proximity were MMPose and AlphaPose, but both had difficulty with finger detection. Thus, we propose a combination of MMPose or AlphaPose for the body and MediaPipe for the hands in a single framework providing the most accurate and robust detection. We also analyse the failure modes of individual detectors -- for example, to what extent the absence of the head of the person in the image degrades performance. Finally, we demonstrate the framework in a scenario where a humanoid robot interacting with a person uses the detected 3D keypoints for whole-body avoidance maneuvers.
ROMar 17, 2022
Active Visuo-Haptic Object Shape CompletionLukas Rustler, Jens Lundell, Jan Kristof Behrens et al.
Recent advancements in object shape completion have enabled impressive object reconstructions using only visual input. However, due to self-occlusion, the reconstructions have high uncertainty in the occluded object parts, which negatively impacts the performance of downstream robotic tasks such as grasping. In this work, we propose an active visuo-haptic shape completion method called Act-VH that actively computes where to touch the objects based on the reconstruction uncertainty. Act-VH reconstructs objects from point clouds and calculates the reconstruction uncertainty using IGR, a recent state-of-the-art implicit surface deep neural network. We experimentally evaluate the reconstruction accuracy of Act-VH against five baselines in simulation and in the real world. We also propose a new simulation environment for this purpose. The results show that Act-VH outperforms all baselines and that an uncertainty-driven haptic exploration policy leads to higher reconstruction accuracy than a random policy and a policy driven by Gaussian Process Implicit Surfaces. As a final experiment, we evaluate Act-VH and the best reconstruction baseline on grasping 10 novel objects. The results show that Act-VH reaches a significantly higher grasp success rate than the baseline on all objects. Together, this work opens up the door for using active visuo-haptic shape completion in more complex cluttered scenes.
RONov 6, 2022
Learning body models: from humans to humanoidsMatej Hoffmann
Humans and animals excel in combining information from multiple sensory modalities, controlling their complex bodies, adapting to growth, failures, or using tools. These capabilities are also highly desirable in robots. They are displayed by machines to some extent. Yet, the artificial creatures are lagging behind. The key foundation is an internal representation of the body that the agent - human, animal, or robot - has developed. The mechanisms of operation of body models in the brain are largely unknown and even less is known about how they are constructed from experience after birth. In collaboration with developmental psychologists, we conducted targeted experiments to understand how infants acquire first "sensorimotor body knowledge". These experiments inform our work in which we construct embodied computational models on humanoid robots that address the mechanisms behind learning, adaptation, and operation of multimodal body representations. At the same time, we assess which of the features of the "body in the brain" should be transferred to robots to give rise to more adaptive and resilient, self-calibrating machines. We extend traditional robot kinematic calibration focusing on self-contained approaches where no external metrology is needed: self-contact and self-observation. Problem formulation allowing to combine several ways of closing the kinematic chain simultaneously is presented, along with a calibration toolbox and experimental validation on several robot platforms. Finally, next to models of the body itself, we study peripersonal space - the space immediately surrounding the body. Again, embodied computational models are developed and subsequently, the possibility of turning these biologically inspired representations into safe human-robot collaboration is studied.
38.9NCApr 30Code
Simulating Infant First-Person Sensorimotor Experience via Motion Retargeting from Babies to HumanoidsFrancisco M. López, Hoshinori Kanazawa, Ondrej Fiala et al.
Motion retargeting from humans to human-like artificial agents is becoming increasingly important as humanoid robots grow more capable. However, most existing approaches focus only on reproducing kinematics and ignore the rich sensorimotor experience associated with human movement. In this work, we present a framework for simulating the multimodal sensorimotor experiences of infants using physical and virtual humanoids. From a single video, our method reconstructs the infant's body configuration by extracting its skeletal structure and estimating the full 3D pose from each frame. Then we map the reconstructed motion onto several developmental platforms: the physical iCub robot and the virtual simulators pyCub, EMFANT and MIMo. Replaying the retargeted motions on these embodiments produces simulated multisensory streams including proprioception (joints and muscles), touch, and vision. For the best-matching embodiment, the retargeting achieves sub-centimeter accuracy and enables a rich multimodal analysis of infant development as well as enhanced automated annotation of behaviors. This framework provides a unique window into the infant's sensorimotor experience, offering new tools for robotics, developmental science, and early detection of neurodevelopmental disorders. The code is available at https://github.com/ctu-vras/motion-retargeting/.
CVDec 17, 2025Code
BLANKET: Anonymizing Faces in Infant Video RecordingsDitmar Hadera, Jan Cech, Miroslav Purkrabek et al.
Ensuring the ethical use of video data involving human subjects, particularly infants, requires robust anonymization methods. We propose BLANKET (Baby-face Landmark-preserving ANonymization with Keypoint dEtection consisTency), a novel approach designed to anonymize infant faces in video recordings while preserving essential facial attributes. Our method comprises two stages. First, a new random face, compatible with the original identity, is generated via inpainting using a diffusion model. Second, the new identity is seamlessly incorporated into each video frame through temporally consistent face swapping with authentic expression transfer. The method is evaluated on a dataset of short video recordings of babies and is compared to the popular anonymization method, DeepPrivacy2. Key metrics assessed include the level of de-identification, preservation of facial attributes, impact on human pose estimation (as an example of a downstream task), and presence of artifacts. Both methods alter the identity, and our method outperforms DeepPrivacy2 in all other respects. The code is available as an easy-to-use anonymization demo at https://github.com/ctu-vras/blanket-infant-face-anonym.
16.4CVMay 7
Neuromorphic visual attention for Sign-language recognition on SpiNNakerSarka Liskova, Olha Vedmedenko, Mazdak Fatahi et al.
Sign-language recognition has achieved substantial gains in classification accuracy in recent years; however, the latency and power requirements of most existing methods limit their suitability for real-time deployment. Neuromorphic sensing and processing offer an alternative paradigm based on sparse, event-driven computation that supports low-latency and energy-efficient perception. In this work, we introduce an end-to-end neuromorphic architecture for American Sign Language (ASL) fingerspelling recognition that integrates a spiking visual attention mechanism for online region-of-interest extraction with a compact spiking neural network deployed on the SpiNNaker neuromorphic platform. We benchmark the proposed system against two datasets: a synthetically generated event-based version of the Sign Language MNIST dataset and a natively recorded ASL-DVS dataset, whilst providing a comprehensive overview of Sign-language recognition and related work. This work yields competitive performance in simulation (92.27%) and comparable performance on neuromorphic hardware deployment (83.1%), while achieving the most energy-efficient architecture (0.565 mW) and low latency (3 ms) across all benchmarked approaches. Despite its compact design, the system demonstrates the suitability of task-dependent visual attention applications for edge deployment.
17.7ROMay 4
ShapeGrasp: Simultaneous Visuo-Haptic Shape Completion and Grasping for Improved Robot ManipulationLukas Rustler, Matej Hoffmann
Humans grasp unfamiliar objects by combining an initial visual estimate with tactile and proprioceptive feedback during interaction. We present ShapeGrasp, a robotic implementation of this approach. The proposed method is an iterative grasp-and-complete pipeline that couples implicit surface visuo-haptic shape completion (creation of full 3D shape from partial information) with physics-based grasp planning. From a single RGB-D view, ShapeGrasp infers a complete shape (point cloud or triangular mesh), generates candidate grasps via rigid-body simulation, and executes the best feasible grasp. Each grasp attempt yields additional geometric constraints -- tactile surface contacts and space occupied by the gripper body -- which are fused to update the object shape. Failures trigger pose re-estimation and regrasping using the refined shape. We evaluate ShapeGrasp in the real world using two different robots and grippers. To the best of our knowledge, this is the first approach that updates shape representations following a real-world grasp. We achieved superior results over baselines for both grippers (grasp success rate of 84% with a three-finger gripper and 91% with a two-finger gripper), while improving the 3D shape reconstruction quality in all evaluation metrics used.
RODec 28, 2025
The body is not there to compute: Comment on "Informational embodiment: Computational role of information structure in codes and robots" by Pitti et alMatej Hoffmann
Applying the lens of computation and information has been instrumental in driving the technological progress of our civilization as well as in empowering our understanding of the world around us. The digital computer was and for many still is the leading metaphor for how our mind operates. Information theory (IT) has also been important in our understanding of how nervous systems encode and process information. The target article deploys information and computation to bodies: to understand why they have evolved in particular ways (animal bodies) and to design optimal bodies (robots). In this commentary, I argue that the main role of bodies is not to compute.
ROApr 23, 2024
Closed Loop Interactive Embodied Reasoning for Robot ManipulationMichal Nazarczuk, Jan Kristof Behrens, Karla Stepanova et al.
Embodied reasoning systems integrate robotic hardware and cognitive processes to perform complex tasks, typically in response to a natural language query about a specific physical environment. This usually involves changing the belief about the scene or physically interacting and changing the scene (e.g. sort the objects from lightest to heaviest). In order to facilitate the development of such systems we introduce a new modular Closed Loop Interactive Embodied Reasoning (CLIER) approach that takes into account the measurements of non-visual object properties, changes in the scene caused by external disturbances as well as uncertain outcomes of robotic actions. CLIER performs multi-modal reasoning and action planning and generates a sequence of primitive actions that can be executed by a robot manipulator. Our method operates in a closed loop, responding to changes in the environment. Our approach is developed with the use of MuBle simulation environment and tested in 10 interactive benchmark scenarios. We extensively evaluate our reasoning approach in simulation and in real-world manipulation tasks with a success rate above 76% and 64%, respectively.
ROApr 10, 2024
Interactive Learning of Physical Object Properties Through Robot Manipulation and Database of Object MeasurementsAndrej Kruzliak, Jiri Hartvich, Shubhan P. Patni et al.
This work presents a framework for automatically extracting physical object properties, such as material composition, mass, volume, and stiffness, through robot manipulation and a database of object measurements. The framework involves exploratory action selection to maximize learning about objects on a table. A Bayesian network models conditional dependencies between object properties, incorporating prior probability distributions and uncertainty associated with measurement actions. The algorithm selects optimal exploratory actions based on expected information gain and updates object properties through Bayesian inference. Experimental evaluation demonstrates effective action selection compared to a baseline and correct termination of the experiments if there is nothing more to be learned. The algorithm proved to behave intelligently when presented with trick objects with material properties in conflict with their appearance. The robot pipeline integrates with a logging module and an online database of objects, containing over 24,000 measurements of 63 objects with different grippers. All code and data are publicly available, facilitating automatic digitization of objects and their physical properties through exploratory manipulations.
AIMay 15, 2025
Embodied AI in Machine Learning -- is it Really Embodied?Matej Hoffmann, Shubhan Parag Patni
Embodied Artificial Intelligence (Embodied AI) is gaining momentum in the machine learning communities with the goal of leveraging current progress in AI (deep learning, transformers, large language and visual-language models) to empower robots. In this chapter we put this work in the context of "Good Old-Fashioned Artificial Intelligence" (GOFAI) (Haugeland, 1989) and the behavior-based or embodied alternatives (R. A. Brooks 1991; Pfeifer and Scheier 2001). We claim that the AI-powered robots are only weakly embodied and inherit some of the problems of GOFAI. Moreover, we review and critically discuss the possibility of cross-embodiment learning (Padalkar et al. 2024). We identify fundamental roadblocks and propose directions on how to make progress.
CVFeb 10, 2025
Wandering around: A bioinspired approach to visual attention through object motion sensitivityGiulia D Angelo, Victoria Clerico, Chiara Bartolozzi et al.
Active vision enables dynamic visual perception, offering an alternative to static feedforward architectures in computer vision, which rely on large datasets and high computational resources. Biological selective attention mechanisms allow agents to focus on salient Regions of Interest (ROIs), reducing computational demand while maintaining real-time responsiveness. Event-based cameras, inspired by the mammalian retina, enhance this capability by capturing asynchronous scene changes enabling efficient low-latency processing. To distinguish moving objects while the event-based camera is in motion the agent requires an object motion segmentation mechanism to accurately detect targets and center them in the visual field (fovea). Integrating event-based sensors with neuromorphic algorithms represents a paradigm shift, using Spiking Neural Networks to parallelize computation and adapt to dynamic environments. This work presents a Spiking Convolutional Neural Network bioinspired attention system for selective attention through object motion sensitivity. The system generates events via fixational eye movements using a Dynamic Vision Sensor integrated into the Speck neuromorphic hardware, mounted on a Pan-Tilt unit, to identify the ROI and saccade toward it. The system, characterized using ideal gratings and benchmarked against the Event Camera Motion Segmentation Dataset, reaches a mean IoU of 82.2% and a mean SSIM of 96% in multi-object motion segmentation. The detection of salient objects reaches 88.8% accuracy in office scenarios and 89.8% in low-light conditions on the Event-Assisted Low-Light Video Object Segmentation Dataset. A real-time demonstrator shows the system's 0.12 s response to dynamic scenes. Its learning-free design ensures robustness across perceptual scenes, making it a reliable foundation for real-time robotic applications serving as a basis for more complex architectures.
NCApr 24, 2025
A computational model of infant sensorimotor exploration in the mobile paradigmJosua Spisak, Sergiu Tcaci Popescu, Stefan Wermter et al.
We present a computational model of the mechanisms that may determine infants' behavior in the "mobile paradigm". This paradigm has been used in developmental psychology to explore how infants learn the sensory effects of their actions. In this paradigm, a mobile (an articulated and movable object hanging above an infant's crib) is connected to one of the infant's limbs, prompting the infant to preferentially move that "connected" limb. This ability to detect a "sensorimotor contingency" is considered to be a foundational cognitive ability in development. To understand how infants learn sensorimotor contingencies, we built a model that attempts to replicate infant behavior. Our model incorporates a neural network, action-outcome prediction, exploration, motor noise, preferred activity level, and biologically-inspired motor control. We find that simulations with our model replicate the classic findings in the literature showing preferential movement of the connected limb. An interesting observation is that the model sometimes exhibits a burst of movement after the mobile is disconnected, casting light on a similar occasional finding in infants. In addition to these general findings, the simulations also replicate data from two recent more detailed studies using a connection with the mobile that was either gradual or all-or-none. A series of ablation studies further shows that the inclusion of mechanisms of action-outcome prediction, exploration, motor noise, and biologically-inspired motor control was essential for the model to correctly replicate infant behavior. This suggests that these components are also involved in infants' sensorimotor learning.
CVJun 25, 2024
Automatic infant 2D pose estimation from videos: comparing seven deep neural network methodsFilipe Gama, Matej Misar, Lukas Navara et al.
Automatic markerless estimation of infant posture and motion from ordinary videos carries great potential for movement studies "in the wild", facilitating understanding of motor development and massively increasing the chances of early diagnosis of disorders. There is rapid development of human pose estimation methods in computer vision thanks to advances in deep learning and machine learning. However, these methods are trained on datasets that feature adults in different contexts. This work tests and compares seven popular methods (AlphaPose, DeepLabCut/DeeperCut, Detectron2, HRNet, MediaPipe/BlazePose, OpenPose, and ViTPose) on videos of infants in supine position and in more complex settings. Surprisingly, all methods except DeepLabCut and MediaPipe have competitive performance without additional finetuning, with ViTPose performing best. Next to standard performance metrics (average precision and recall), we introduce errors expressed in the neck-mid-hip (torso length) ratio and additionally study missed and redundant detections, and the reliability of the internal confidence ratings of the different methods, which are relevant for downstream tasks. Among the networks with competitive performance, only AlphaPose could run close to real time (27 fps) on our machine. We provide documented Docker containers or instructions for all the methods we used, our analysis scripts, and the processed data at https://hub.docker.com/u/humanoidsctu and https://osf.io/x465b/.
ROJan 20, 2022
Body Models in Humans and RobotsMatej Hoffmann, Matthew R. Longo
Neurocognitive models of higher-level somatosensory processing have emphasised the role of stored body representations in interpreting real-time sensory signals coming from the body (Longo, Azanon and Haggard, 2010; Tame, Azanon and Longo, 2019). The need for such stored representations arises from the fact that immediate sensory signals coming from the body do not specify metric details about body size and shape. Several aspects of somatoperception, therefore, require that immediate sensory signals be combined with stored body representations. This basic problem is equally true for humanoid robots and, intriguingly, neurocognitive models developed to explain human perception are strikingly similar to those developed independently for localizing touch on humanoid robots, such as the iCub, equipped with artificial electronic skin on the majority of its body surface (Roncone et al., 2014; Hoffmann, 2021). In this chapter, we will review the key features of these models, discuss their similarities and differences to each other, and to other models in the literature. Using robots as embodied computational models is an example of synthetic methodology or 'understanding by building' (e.g., Hoffmann and Pfeifer, 2018), computational embodied neuroscience (Caligiore et al., 2010) or 'synthetic psychology of the self' (Prescott and Camilleri, 2019). Such models have the advantage that they need to be worked out into every detail, making any theory explicit and complete. There is also an additional way of (pre)validating such a theory other than comparing to the biological or psychological phenomenon studied by simply verifying that a particular implementation really performs the task: can the robot localize where it is being touched (see https://youtu.be/pfse424t5mQ)?
RODec 14, 2020
Automatic self-contained calibration of an industrial dual-arm robot with cameras using self-contact, planar constraints, and self-observationKarla Stepanova, Jakub Rozlivek, Frantisek Puciow et al.
We present a robot kinematic calibration method that combines complementary calibration approaches: self-contact, planar constraints, and self-observation. We analyze the estimation of the end effector parameters, joint offsets of the manipulators, and calibration of the complete kinematic chain (DH parameters). The results are compared with ground truth measurements provided by a laser tracker. Our main findings are: (1) When applying the complementary calibration approaches in isolation, the self-contact approach yields the best and most stable results. (2) All combinations of more than one approach were always superior to using any single approach in terms of calibration errors and the observability of the estimated parameters. Combining more approaches delivers robot parameters that better generalize to the workspace parts not used for the calibration. (3) Sequential calibration, i.e. calibrating cameras first and then robot kinematics, is more effective than simultaneous calibration of all parameters. In real experiments, we employ two industrial manipulators mounted on a common base. The manipulators are equipped with force/torque sensors at their wrists, with two cameras attached to the robot base, and with special end effectors with fiducial markers. We collect a new comprehensive dataset for robot kinematic calibration and make it publicly available. The dataset and its analysis provide quantitative and qualitative insights that go beyond the specific manipulators used in this work and apply to self-contained robot kinematic calibration in general.
RONov 9, 2020
Robot in the mirror: toward an embodied computational model of mirror self-recognitionMatej Hoffmann, Shengzhi Wang, Vojtech Outrata et al.
Self-recognition or self-awareness is a capacity attributed typically only to humans and few other species. The definitions of these concepts vary and little is known about the mechanisms behind them. However, there is a Turing test-like benchmark: the mirror self-recognition, which consists in covertly putting a mark on the face of the tested subject, placing her in front of a mirror, and observing the reactions. In this work, first, we provide a mechanistic decomposition, or process model, of what components are required to pass this test. Based on these, we provide suggestions for empirical research. In particular, in our view, the way the infants or animals reach for the mark should be studied in detail. Second, we develop a model to enable the humanoid robot Nao to pass the test. The core of our technical contribution is learning the appearance representation and visual novelty detection by means of learning the generative model of the face with deep auto-encoders and exploiting the prediction error. The mark is identified as a salient region on the face and reaching action is triggered, relying on a previously learned mapping to arm joint angles. The architecture is tested on two robots with a completely different face.
NCOct 19, 2020
Body models in humans, animals, and robots: mechanisms and plasticityMatej Hoffmann
Humans and animals excel in combining information from multiple sensory modalities, controlling their complex bodies, adapting to growth, failures, or using tools. These capabilities are also highly desirable in robots. They are displayed by machines to some extent - yet, as is so often the case, the artificial creatures are lagging behind. The key foundation is an internal representation of the body that the agent - human, animal, or robot - has developed. In the biological realm, evidence has been accumulated by diverse disciplines giving rise to the concepts of body image, body schema, and others. In robotics, a model of the robot is an indispensable component that enables to control the machine. In this article I compare the character of body representations in biology with their robotic counterparts and relate that to the differences in performance that we observe. I put forth a number of axes regarding the nature of such body models: fixed vs. plastic, amodal vs. modal, explicit vs. implicit, serial vs. parallel, modular vs. holistic, and centralized vs. distributed. An interesting trend emerges: on many of the axes, there is a sequence from robot body models, over body image, body schema, to the body representation in lower animals like the octopus. In some sense, robots have a lot in common with Ian Waterman - "the man who lost his body" - in that they rely on an explicit, veridical body model (body image taken to the extreme) and lack any implicit, multimodal representation (like the body schema) of their bodies. I will then detail how robots can inform the biological sciences dealing with body representations and finally, I will study which of the features of the "body in the brain" should be transferred to robots, giving rise to more adaptive and resilient, self-calibrating machines.
HCSep 3, 2020
Should a small robot have a small personal space? Investigating personal spatial zones and proxemic behavior in human-robot interactionHagen Lehmann, Adam Rojik, Matej Hoffmann
This paper presents the first study in a series of proxemics experiments concerned with the role of personal spatial zones in human-robot interaction. In the study 40 participants approached a NAO robot positioned approximately at participants' eye level and entered different social zones around the robot (personal and intimate space). When the robot perceived the approaching person entering its personal space, it started gazing at the participant, and upon the intrusion of its intimate space it leaned back. Our research questions were: (1) given the small size of the robot (58 cm tall), will people expect its social zones to shrink by its size? (2) Will the robot behaviors be interpreted as appropriate social behaviors? We found that the average approach distance of the participants was 48 cm, which represents the inner limit of the human-size personal zone (45-120 cm), but is outside of the personal zone scaled to robot size (16-42 cm). This suggests that most participants did not (fully) scale down the extent of these zones to the robot size. We also found that the leaning back behavior of the robot was correctly interpreted by most participants as the robot's reaction to the intrusion of its personal space; however, our implementation of the behavior was often perceived as "unfriendly". We will discuss this and other limitations of the study in detail. Additionally we found positive correlations between participants' personality traits, Godspeed Questionnaire subscales, and the average approach distance. The technical contribution of this work is the real-time perception of 25 keypoints on the human body using a single compact RGB-D camera and the use of these points for accurate interpersonal distance estimation and as gazing targets for the robot.
ROSep 2, 2020
3D Collision-Force-Map for Safe Human-Robot CollaborationPetr Svarny, Jakub Rozlivek, Lukas Rustler et al.
The need to guarantee safety of collaborative robots limits their performance, in particular, their speed and hence cycle time. The standard ISO/TS 15066 defines the Power and Force Limiting operation mode and prescribes force thresholds that a moving robot is allowed to exert on human body parts during impact, along with a simple formula to obtain maximum allowed speed of the robot in the whole workspace. In this work, we measure the forces exerted by two collaborative manipulators (UR10e and KUKA LBR iiwa) moving downward against an impact measuring device. First, we empirically show that the impact forces can vary by more than 100 percent within the robot workspace. The forces are negatively correlated with the distance from the robot base and the height in the workspace. Second, we present a data-driven model, 3D Collision-Force-Map, predicting impact forces from distance, height, and velocity and demonstrate that it can be trained on a limited number of data points. Third, we analyze the force evolution upon impact and find that clamping never occurs for the UR10e. We show that formulas relating robot mass, velocity, and impact forces from ISO/TS 15066 are insufficient -- leading both to significant underestimation and overestimation and thus to unnecessarily long cycle times or even dangerous applications. We propose an empirical method that can be deployed to quickly determine the optimal speed and position where a task can be safely performed with maximum efficiency.
ROAug 31, 2020
Active exploration for body model learning through self-touch on a humanoid robot with artificial skinFilipe Gama, Maksym Shcherban, Matthias Rolf et al.
The mechanisms of infant development are far from understood. Learning about one's own body is likely a foundation for subsequent development. Here we look specifically at the problem of how spontaneous touches to the body in early infancy may give rise to first body models and bootstrap further development such as reaching competence. Unlike visually elicited reaching, reaching to own body requires connections of the tactile and motor space only, bypassing vision. Still, the problems of high dimensionality and redundancy of the motor system persist. In this work, we present an embodied computational model on a simulated humanoid robot with artificial sensitive skin on large areas of its body. The robot should autonomously develop the capacity to reach for every tactile sensor on its body. To do this efficiently, we employ the computational framework of intrinsic motivations and variants of goal babbling, as opposed to motor babbling, that prove to make the exploration process faster and alleviate the ill-posedness of learning inverse kinematics. Based on our results, we discuss the next steps in relation to infant studies: what information will be necessary to further ground this computational model in behavioral data.
CVApr 30, 2020
PreCNet: Next-Frame Video Prediction Based on Predictive CodingZdenek Straka, Tomas Svoboda, Matej Hoffmann
Predictive coding, currently a highly influential theory in neuroscience, has not been widely adopted in machine learning yet. In this work, we transform the seminal model of Rao and Ballard (1999) into a modern deep learning framework while remaining maximally faithful to the original schema. The resulting network we propose (PreCNet) is tested on a widely used next frame video prediction benchmark, which consists of images from an urban environment recorded from a car-mounted camera, and achieves state-of-the-art performance. Performance on all measures (MSE, PSNR, SSIM) was further improved when a larger training set (2M images from BDD100k), pointing to the limitations of the KITTI training set. This work demonstrates that an architecture carefully based in a neuroscience model, without being explicitly tailored to the task at hand, can exhibit exceptional performance.
ROSep 5, 2019
The homunculus for proprioception: Toward learning the representation of a humanoid robot's joint space using self-organizing mapsFilipe Gama, Matej Hoffmann
In primate brains, tactile and proprioceptive inputs are relayed to the somatosensory cortex which is known for somatotopic representations, or, "homunculi". Our research centers on understanding the mechanisms of the formation of these and more higher-level body representations (body schema) by using humanoid robots and neural networks to construct models. We specifically focus on how spatial representation of the body may be learned from somatosensory information in self-touch configurations. In this work, we target the representation of proprioceptive inputs, which we take to be joint angles in the robot. The inputs collected in different body postures serve as inputs to a Self-Organizing Map (SOM) with a 2D lattice on the output. With unrestricted, all-to-all connections, the map is not capable of representing the input space while preserving the topological relationships, because the intrinsic dimensionality of the body posture space is too large. Hence, we use a method we developed previously for tactile inputs (Hoffmann, Straka et al. 2018) called MRF-SOM, where the Maximum Receptive Field of output neurons is restricted so they only learn to represent specific parts of the input space. This is in line with the receptive fields of neurons in somatosensory areas representing proprioception that often respond to combination of few joints (e.g. wrist and elbow).
ROAug 8, 2019
Safe physical HRI: Toward a unified treatment of speed and separation monitoring together with power and force limitingPetr Svarny, Michael Tesar, Jan Kristof Behrens et al.
So-called collaborative robots are a current trend in industrial robotics. However, they still face many problems in practical application such as reduced speed to ascertain their collaborativeness. The standards prescribe two regimes: (i) speed and separation monitoring and (ii) power and force limiting, where the former requires reliable estimation of distances between the robot and human body parts and the latter imposes constraints on the energy absorbed during collisions prior to robot stopping. Following the standards, we deploy the two collaborative regimes in a single application and study the performance in a mock collaborative task under the individual regimes, including transitions between them. Additionally, we compare the performance under "safety zone monitoring" with keypoint pair-wise separation distance assessment relying on an RGB-D sensor and skeleton extraction algorithm to track human body parts in the workspace. Best performance has been achieved in the following setting: robot operates at full speed until a distance threshold between any robot and human body part is crossed; then, reduced robot speed per power and force limiting is triggered. Robot is halted only when the operator's head crosses a predefined distance from selected robot parts. We demonstrate our methodology on a setup combining a KUKA LBR iiwa robot, Intel RealSense RGB-D sensor and OpenPose for human pose estimation.
ROMay 18, 2018
Robot self-calibration using multiple kinematic chains -- a simulation study on the iCub humanoid robotKarla Stepanova, Tomas Pajdla, Matej Hoffmann
Mechanism calibration is an important and non-trivial task in robotics. Advances in sensor technology make affordable but increasingly accurate devices such as cameras and tactile sensors available, making it possible to perform automated self-contained calibration relying on redundant information in these sensory streams. In this work, we use a simulated iCub humanoid robot with a stereo camera system and end-effector contact emulation to quantitatively compare the performance of kinematic calibration by employing different combinations of intersecting kinematic chains -- either through self-observation or self-touch. The parameters varied were: (i) type and number of intersecting kinematic chains used for calibration, (ii) parameters and chains subject to optimization, (iii) amount of initial perturbation of kinematic parameters, (iv) number of poses/configurations used for optimization, (v) amount of measurement noise in end-effector positions / cameras. The main findings are: (1) calibrating parameters of a single chain (e.g. one arm) by employing multiple kinematic chains ("self-observation" and "self-touch") is superior in terms of optimization results as well as observability; (2) when using multi-chain calibration, fewer poses suffice to get similar performance compared to when for example only observation from a single camera is used; (3) parameters of all chains (here 86 DH parameters) can be subject to calibration simultaneously and with 50 (100) poses, end-effector error of around 2 (1) mm can be achieved; (4) adding noise to a sensory modality degrades performance of all calibrations employing the chains relying on this information.
AIJan 26, 2018
Symbol Emergence in Cognitive Developmental Systems: a SurveyTadahiro Taniguchi, Emre Ugur, Matej Hoffmann et al.
Humans use signs, e.g., sentences in a spoken language, for communication and thought. Hence, symbol systems like language are crucial for our communication with other agents and adaptation to our real-world environment. The symbol systems we use in our human society adaptively and dynamically change over time. In the context of artificial intelligence (AI) and cognitive systems, the symbol grounding problem has been regarded as one of the central problems related to {\it symbols}. However, the symbol grounding problem was originally posed to connect symbolic AI and sensorimotor information and did not consider many interdisciplinary phenomena in human communication and dynamic symbol systems in our society, which semiotics considered. In this paper, we focus on the symbol emergence problem, addressing not only cognitive dynamics but also the dynamics of symbol systems in society, rather than the symbol grounding problem. We first introduce the notion of a symbol in semiotics from the humanities, to leave the very narrow idea of symbols in symbolic AI. Furthermore, over the years, it became more and more clear that symbol emergence has to be regarded as a multifaceted problem. Therefore, secondly, we review the history of the symbol emergence problem in different fields, including both biological and artificial systems, showing their mutual relations. We summarize the discussion and provide an integrative viewpoint and comprehensive overview of symbol emergence in cognitive systems. Additionally, we describe the challenges facing the creation of cognitive systems that can be part of symbol emergence systems.
ROJan 17, 2018
Compact Real-time avoidance on a Humanoid Robot for Human-robot InteractionDong Hai Phuong Nguyen, Matej Hoffmann, Alessandro Roncone et al.
With robots leaving factories and entering less controlled domains, possibly sharing the space with humans, safety is paramount and multimodal awareness of the body surface and the surrounding environment is fundamental. Taking inspiration from peripersonal space representations in humans, we present a framework on a humanoid robot that dynamically maintains such a protective safety zone, composed of the following main components: (i) a human 2D keypoints estimation pipeline employing a deep learning based algorithm, extended here into 3D using disparity; (ii) a distributed peripersonal space representation around the robot's body parts; (iii) a reaching controller that incorporates all obstacles entering the robot's safety zone on the fly into the task. Pilot experiments demonstrate that an effective safety margin between the robot's and the human's body parts is kept. The proposed solution is flexible and versatile since the safety zone around individual robot and human body parts can be selectively modulated---here we demonstrate stronger avoidance of the human head compared to rest of the body. Our system works in real time and is self-contained, with no external sensory equipment and use of onboard cameras only.
AIJan 15, 2018
Robots as Powerful Allies for the Study of Embodied Cognition from the Bottom UpMatej Hoffmann, Rolf Pfeifer
A large body of compelling evidence has been accumulated demonstrating that embodiment - the agent's physical setup, including its shape, materials, sensors and actuators - is constitutive for any form of cognition and as a consequence, models of cognition need to be embodied. In contrast to methods from empirical sciences to study cognition, robots can be freely manipulated and virtually all key variables of their embodiment and control programs can be systematically varied. As such, they provide an extremely powerful tool of investigation. We present a robotic bottom-up or developmental approach, focusing on three stages: (a) low-level behaviors like walking and reflexes, (b) learning regularities in sensorimotor spaces, and (c) human-like cognition. We also show that robotic based research is not only a productive path to deepening our understanding of cognition, but that robots can strongly benefit from human-like cognition in order to become more autonomous, robust, resilient, and safe.
AIJun 12, 2017
DAC-h3: A Proactive Robot Cognitive Architecture to Acquire and Express Knowledge About the World and the SelfClément Moulin-Frier, Tobias Fischer, Maxime Petit et al.
This paper introduces a cognitive architecture for a humanoid robot to engage in a proactive, mixed-initiative exploration and manipulation of its environment, where the initiative can originate from both the human and the robot. The framework, based on a biologically-grounded theory of the brain and mind, integrates a reactive interaction engine, a number of state-of-the-art perceptual and motor learning algorithms, as well as planning abilities and an autobiographical memory. The architecture as a whole drives the robot behavior to solve the symbol grounding problem, acquire language capabilities, execute goal-oriented behavior, and express a verbal narrative of its own experience in the world. We validate our approach in human-robot interaction experiments with the iCub humanoid robot, showing that the proposed cognitive architecture can be applied in real time within a realistic scenario and that it can be used with naive users.
NEJun 8, 2017
Where is my forearm? Clustering of body parts from simultaneous tactile and linguistic input using sequential mappingKarla Stepanova, Matej Hoffmann, Zdenek Straka et al.
Humans and animals are constantly exposed to a continuous stream of sensory information from different modalities. At the same time, they form more compressed representations like concepts or symbols. In species that use language, this process is further structured by this interaction, where a mapping between the sensorimotor concepts and linguistic elements needs to be established. There is evidence that children might be learning language by simply disambiguating potential meanings based on multiple exposures to utterances in different contexts (cross-situational learning). In existing models, the mapping between modalities is usually found in a single step by directly using frequencies of referent and meaning co-occurrences. In this paper, we present an extension of this one-step mapping and introduce a newly proposed sequential mapping algorithm together with a publicly available Matlab implementation. For demonstration, we have chosen a less typical scenario: instead of learning to associate objects with their names, we focus on body representations. A humanoid robot is receiving tactile stimulations on its body, while at the same time listening to utterances of the body part names (e.g., hand, forearm and torso). With the goal at arriving at the correct "body categories", we demonstrate how a sequential mapping algorithm outperforms one-step mapping. In addition, the effect of data set size and noise in the linguistic input are studied.
NEJul 20, 2016
The encoding of proprioceptive inputs in the brain: knowns and unknowns from a robotic perspectiveMatej Hoffmann, Nada Bednarova
Somatosensory inputs can be grossly divided into tactile (or cutaneous) and proprioceptive -- the former conveying information about skin stimulation, the latter about limb position and movement. The principal proprioceptors are constituted by muscle spindles, which deliver information about muscle length and speed. In primates, this information is relayed to the primary somatosensory cortex and eventually the posterior parietal cortex, where integrated information about body posture (postural schema) is presumably available. However, coming from robotics and seeking a biologically motivated model that could be used in a humanoid robot, we faced a number of difficulties. First, it is not clear what neurons in the ascending pathway and primary somatosensory cortex code. To an engineer, joint angles would seem the most useful variables. However, the lengths of individual muscles have nonlinear relationships with the angles at joints. Kim et al. (Neuron, 2015) found different types of proprioceptive neurons in the primary somatosensory cortex -- sensitive to movement of single or multiple joints or to static postures. Second, there are indications that the somatotopic arrangement ("the homunculus") of these brain areas is to a significant extent learned. However, the mechanisms behind this developmental process are unclear. We will report first results from modeling of this process using data obtained from body babbling in the iCub humanoid robot and feeding them into a Self-Organizing Map (SOM). Our results reveal that the SOM algorithm is only suited to develop receptive fields of the posture-selective type. Furthermore, the SOM algorithm has intrinsic difficulties when combined with population code on its input and in particular with nonlinear tuning curves (sigmoids or Gaussians).
RONov 9, 2014
Trade-Offs in Exploiting Body Morphology for Control: from Simple Bodies and Model-Based Control to Complex Bodies with Model-Free Distributed Control SchemesMatej Hoffmann, Vincent C. Müller
Tailoring the design of robot bodies for control purposes is implicitly performed by engineers, however, a methodology or set of tools is largely absent and optimization of morphology (shape, material properties of robot bodies, etc.) is lagging behind the development of controllers. This has become even more prominent with the advent of compliant, deformable or "soft" bodies. These carry substantial potential regarding their exploitation for control---sometimes referred to as "morphological computation" in the sense of offloading computation needed for control to the body. Here, we will argue in favor of a dynamical systems rather than computational perspective on the problem. Then, we will look at the pros and cons of simple vs. complex bodies, critically reviewing the attractive notion of "soft" bodies automatically taking over control tasks. We will address another key dimension of the design space---whether model-based control should be used and to what extent it is feasible to develop faithful models for different morphologies.
AIFeb 2, 2012
The implications of embodiment for behavior and cognition: animal and robotic case studiesMatej Hoffmann, Rolf Pfeifer
In this paper, we will argue that if we want to understand the function of the brain (or the control in the case of robots), we must understand how the brain is embedded into the physical system, and how the organism interacts with the real world. While embodiment has often been used in its trivial meaning, i.e. 'intelligence requires a body', the concept has deeper and more important implications, concerned with the relation between physical and information (neural, control) processes. A number of case studies are presented to illustrate the concept. These involve animals and robots and are concentrated around locomotion, grasping, and visual perception. A theoretical scheme that can be used to embed the diverse case studies will be presented. Finally, we will establish a link between the low-level sensory-motor processes and cognition. We will present an embodied view on categorization, and propose the concepts of 'body schema' and 'forward models' as a natural extension of the embodied approach toward first representations.