Vadim Tikhanoff

RO
7papers
98citations
Novelty34%
AI Score20

7 Papers

ROJul 16, 2021
Weakly-Supervised Object Detection Learning through Human-Robot Interaction

Elisa Maiettini, Vadim Tikhanoff, Lorenzo Natale

Reliable perception and efficient adaptation to novel conditions are priority skills for humanoids that function in dynamic environments. The vast advancements in latest computer vision research, brought by deep learning methods, are appealing for the robotics community. However, their adoption in applied domains is not straightforward since adapting them to new tasks is strongly demanding in terms of annotated data and optimization time. Nevertheless, robotic platforms, and especially humanoids, present opportunities (such as additional sensors and the chance to explore the environment) that can be exploited to overcome these issues. In this paper, we present a pipeline for efficiently training an object detection system on a humanoid robot. The proposed system allows to iteratively adapt an object detection model to novel scenarios, by exploiting: (i) a teacher-learner pipeline, (ii) weakly supervised learning techniques to reduce the human labeling effort and (iii) an on-line learning approach for fast model re-training. We use the R1 humanoid robot for both testing the proposed pipeline in a real-time application and acquire sequences of images to benchmark the method. We made the code of the application publicly available.

CVDec 28, 2020
From Handheld to Unconstrained Object Detection: a Weakly-supervised On-line Learning Approach

Elisa Maiettini, Andrea Maracani, Raffaello Camoriano et al.

Deep Learning (DL) based methods for object detection achieve remarkable performance at the cost of computationally expensive training and extensive data labeling. Robots embodiment can be exploited to mitigate this burden by acquiring automatically annotated training data via a natural interaction with a human showing the object of interest, handheld. However, learning solely from this data may introduce biases (the so-called domain shift), and prevents adaptation to novel tasks. While Weakly-supervised Learning (WSL) offers a well-established set of techniques to cope with these problems in general-purpose Computer Vision, its adoption in challenging robotic domains is still at a preliminary stage. In this work, we target the scenario of a robot trained in a teacher-learner setting to detect handheld objects. The aim is to improve detection performance in different settings by letting the robot explore the environment with a limited human labeling budget. We compare several techniques for WSL in detection pipelines to reduce model re-training costs without compromising accuracy, proposing solutions which target the considered robotic scenario. We show that the robot can improve adaptation to novel domains, either by interacting with a human teacher (Active Learning) or with an autonomous supervision (Semi-supervised Learning). We integrate our strategies into an on-line detection method, achieving efficient model update capabilities with few labels. We experimentally benchmark our method on challenging robotic object detection tasks under domain shift.

ROJul 9, 2019
Sequence-to-Sequence Natural Language to Humanoid Robot Sign Language

Jennifer J. Gago, Valentina Vasco, Bartek Łukawski et al.

This paper presents a study on natural language to sign language translation with human-robot interaction application purposes. By means of the presented methodology, the humanoid robot TEO is expected to represent Spanish sign language automatically by converting text into movements, thanks to the performance of neural networks. Natural language to sign language translation presents several challenges to developers, such as the discordance between the length of input and output data and the use of non-manual markers. Therefore, neural networks and, consequently, sequence-to-sequence models, are selected as a data-driven system to avoid traditional expert system approaches or temporal dependencies limitations that lead to limited or too complex translation systems. To achieve these objectives, it is necessary to find a way to perform human skeleton acquisition in order to collect the signing input data. OpenPose and skeletonRetriever are proposed for this purpose and a 3D sensor specification study is developed to select the best acquisition hardware.

CLNov 6, 2018
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments

Giovanni Morrone, Luca Pasa, Vadim Tikhanoff et al.

In this paper, we address the problem of enhancing the speech of a speaker of interest in a cocktail party scenario when visual information of the speaker of interest is available. Contrary to most previous studies, we do not learn visual features on the typically small audio-visual datasets, but use an already available face landmark detector (trained on a separate image dataset). The landmarks are used by LSTM-based models to generate time-frequency masks which are applied to the acoustic mixed-speech spectrogram. Results show that: (i) landmark motion features are very effective features for this task, (ii) similarly to previous work, reconstruction of the target speaker's spectrogram mediated by masking is significantly more accurate than direct spectrogram reconstruction, and (iii) the best masks depend on both motion landmark features and the input mixed-speech spectrogram. To the best of our knowledge, our proposed models are the first models trained and evaluated on the limited size GRID and TCD-TIMIT datasets, that achieve speaker-independent speech enhancement in a multi-talker setting.

ROOct 12, 2017
Markerless visual servoing on unknown objects for humanoid robot platforms

Claudio Fantacci, Giulia Vezzani, Ugo Pattacini et al.

To precisely reach for an object with a humanoid robot, it is of central importance to have good knowledge of both end-effector, object pose and shape. In this work we propose a framework for markerless visual servoing on unknown objects, which is divided in four main parts: I) a least-squares minimization problem is formulated to find the volume of the object graspable by the robot's hand using its stereo vision; II) a recursive Bayesian filtering technique, based on Sequential Monte Carlo (SMC) filtering, estimates the 6D pose (position and orientation) of the robot's end-effector without the use of markers; III) a nonlinear constrained optimization problem is formulated to compute the desired graspable pose about the object; IV) an image-based visual servo control commands the robot's end-effector toward the desired pose. We demonstrate effectiveness and robustness of our approach with extensive experiments on the iCub humanoid robot platform, achieving real-time computation, smooth trajectories and sub-pixel precisions.

ROMar 14, 2017
Visual end-effector tracking using a 3D model-aided particle filter for humanoid robot platforms

Claudio Fantacci, Ugo Pattacini, Vadim Tikhanoff et al.

This paper addresses recursive markerless estimation of a robot's end-effector using visual observations from its cameras. The problem is formulated into the Bayesian framework and addressed using Sequential Monte Carlo (SMC) filtering. We use a 3D rendering engine and Computer Aided Design (CAD) schematics of the robot to virtually create images from the robot's camera viewpoints. These images are then used to extract information and estimate the pose of the end-effector. To this aim, we developed a particle filter for estimating the position and orientation of the robot's end-effector using the Histogram of Oriented Gradient (HOG) descriptors to capture robust characteristic features of shapes in both cameras and rendered images. We implemented the algorithm on the iCub humanoid robot and employed it in a closed-loop reaching scenario. We demonstrate that the tracking is robust to clutter, allows compensating for errors in the robot kinematics and servoing the arm in closed loop using vision.

RONov 4, 2014
Enhancing software module reusability using port plug-ins: an experiment with the iCub robot

Ali Paikan, Vadim Tikhanoff, Giorgio Metta et al.

Systematically developing high--quality reusable software components is a difficult task and requires careful design to find a proper balance between potential reuse, functionalities and ease of implementation. Extendibility is an important property for software which helps to reduce cost of development and significantly boosts its reusability. This work introduces an approach to enhance components reusability by extending their functionalities using plug-ins at the level of the connection points (ports). Application--dependent functionalities such as data monitoring and arbitration can be implemented using a conventional scripting language and plugged into the ports of components. The main advantage of our approach is that it avoids to introduce application--dependent modifications to existing components, thus reducing development time and fostering the development of simpler and therefore more reusable components. Another advantage of our approach is that it reduces communication and deployment overheads as extra functionalities can be added without introducing additional modules.