Filipe Gama

RO
3papers
17citations
Novelty28%
AI Score20

3 Papers

CVJun 25, 2024
Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods

Filipe Gama, Matej Misar, Lukas Navara et al.

Automatic markerless estimation of infant posture and motion from ordinary videos carries great potential for movement studies "in the wild", facilitating understanding of motor development and massively increasing the chances of early diagnosis of disorders. There is rapid development of human pose estimation methods in computer vision thanks to advances in deep learning and machine learning. However, these methods are trained on datasets that feature adults in different contexts. This work tests and compares seven popular methods (AlphaPose, DeepLabCut/DeeperCut, Detectron2, HRNet, MediaPipe/BlazePose, OpenPose, and ViTPose) on videos of infants in supine position and in more complex settings. Surprisingly, all methods except DeepLabCut and MediaPipe have competitive performance without additional finetuning, with ViTPose performing best. Next to standard performance metrics (average precision and recall), we introduce errors expressed in the neck-mid-hip (torso length) ratio and additionally study missed and redundant detections, and the reliability of the internal confidence ratings of the different methods, which are relevant for downstream tasks. Among the networks with competitive performance, only AlphaPose could run close to real time (27 fps) on our machine. We provide documented Docker containers or instructions for all the methods we used, our analysis scripts, and the processed data at https://hub.docker.com/u/humanoidsctu and https://osf.io/x465b/.

ROAug 31, 2020
Active exploration for body model learning through self-touch on a humanoid robot with artificial skin

Filipe Gama, Maksym Shcherban, Matthias Rolf et al.

The mechanisms of infant development are far from understood. Learning about one's own body is likely a foundation for subsequent development. Here we look specifically at the problem of how spontaneous touches to the body in early infancy may give rise to first body models and bootstrap further development such as reaching competence. Unlike visually elicited reaching, reaching to own body requires connections of the tactile and motor space only, bypassing vision. Still, the problems of high dimensionality and redundancy of the motor system persist. In this work, we present an embodied computational model on a simulated humanoid robot with artificial sensitive skin on large areas of its body. The robot should autonomously develop the capacity to reach for every tactile sensor on its body. To do this efficiently, we employ the computational framework of intrinsic motivations and variants of goal babbling, as opposed to motor babbling, that prove to make the exploration process faster and alleviate the ill-posedness of learning inverse kinematics. Based on our results, we discuss the next steps in relation to infant studies: what information will be necessary to further ground this computational model in behavioral data.

ROSep 5, 2019
The homunculus for proprioception: Toward learning the representation of a humanoid robot's joint space using self-organizing maps

Filipe Gama, Matej Hoffmann

In primate brains, tactile and proprioceptive inputs are relayed to the somatosensory cortex which is known for somatotopic representations, or, "homunculi". Our research centers on understanding the mechanisms of the formation of these and more higher-level body representations (body schema) by using humanoid robots and neural networks to construct models. We specifically focus on how spatial representation of the body may be learned from somatosensory information in self-touch configurations. In this work, we target the representation of proprioceptive inputs, which we take to be joint angles in the robot. The inputs collected in different body postures serve as inputs to a Self-Organizing Map (SOM) with a 2D lattice on the output. With unrestricted, all-to-all connections, the map is not capable of representing the input space while preserving the topological relationships, because the intrinsic dimensionality of the body posture space is too large. Hence, we use a method we developed previously for tactile inputs (Hoffmann, Straka et al. 2018) called MRF-SOM, where the Maximum Receptive Field of output neurons is restricted so they only learn to represent specific parts of the input space. This is in line with the receptive fields of neurons in somatosensory areas representing proprioception that often respond to combination of few joints (e.g. wrist and elbow).