Emanuele Menegatti

RO
h-index23
20papers
359citations
Novelty46%
AI Score51

20 Papers

CVJun 6, 2022Code
People Tracking in Panoramic Video for Guiding Robots

Alberto Bacchin, Filippo Berno, Emanuele Menegatti et al.

A guiding robot aims to effectively bring people to and from specific places within environments that are possibly unknown to them. During this operation the robot should be able to detect and track the accompanied person, trying never to lose sight of her/him. A solution to minimize this event is to use an omnidirectional camera: its 360° Field of View (FoV) guarantees that any framed object cannot leave the FoV if not occluded or very far from the sensor. However, the acquired panoramic videos introduce new challenges in perception tasks such as people detection and tracking, including the large size of the images to be processed, the distortion effects introduced by the cylindrical projection and the periodic nature of panoramic images. In this paper, we propose a set of targeted methods that allow to effectively adapt to panoramic videos a standard people detection and tracking pipeline originally designed for perspective cameras. Our methods have been implemented and tested inside a deep learning-based people detection and tracking framework with a commercial 360° camera. Experiments performed on datasets specifically acquired for guiding robot applications and on a real service robot show the effectiveness of the proposed approach over other state-of-the-art systems. We release with this paper the acquired and annotated datasets and the open-source implementation of our method.

ROSep 25, 2024Code
WasteGAN: Data Augmentation for Robotic Waste Sorting through Generative Adversarial Networks

Alberto Bacchin, Leonardo Barcellona, Matteo Terreran et al.

Robotic waste sorting poses significant challenges in both perception and manipulation, given the extreme variability of objects that should be recognized on a cluttered conveyor belt. While deep learning has proven effective in solving complex tasks, the necessity for extensive data collection and labeling limits its applicability in real-world scenarios like waste sorting. To tackle this issue, we introduce a data augmentation method based on a novel GAN architecture called wasteGAN. The proposed method allows to increase the performance of semantic segmentation models, starting from a very limited bunch of labeled examples, such as few as 100. The key innovations of wasteGAN include a novel loss function, a novel activation function, and a larger generator block. Overall, such innovations helps the network to learn from limited number of examples and synthesize data that better mirrors real-world distributions. We then leverage the higher-quality segmentation masks predicted from models trained on the wasteGAN synthetic data to compute semantic-aware grasp poses, enabling a robotic arm to effectively recognizing contaminants and separating waste in a real-world scenario. Through comprehensive evaluation encompassing dataset-based assessments and real-world experiments, our methodology demonstrated promising potential for robotic waste sorting, yielding performance gains of up to 5.8\% in picking contaminants. The project page is available at https://github.com/bach05/wasteGAN.git

CVSep 2, 2024Code
SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation

Alberto Bacchin, Davide Allegro, Stefano Ghidoni et al.

Out-of-Distribution (OOD) detection in computer vision is a crucial research area, with related benchmarks playing a vital role in assessing the generalizability of models and their applicability in real-world scenarios. However, existing OOD benchmarks in the literature suffer from two main limitations: (1) they often overlook semantic shift as a potential challenge, and (2) their scale is limited compared to the large datasets used to train modern models. To address these gaps, we introduce SOOD-ImageNet, a novel dataset comprising around 1.6M images across 56 classes, designed for common computer vision tasks such as image classification and semantic segmentation under OOD conditions, with a particular focus on the issue of semantic shift. We ensured the necessary scalability and quality by developing an innovative data engine that leverages the capabilities of modern vision-language models, complemented by accurate human checks. Through extensive training and evaluation of various models on SOOD-ImageNet, we showcase its potential to significantly advance OOD research in computer vision. The project page is available at https://github.com/bach05/SOODImageNet.git.

ROJun 7, 2022
Pushing the Limits of Learning-based Traversability Analysis for Autonomous Driving on CPU

Daniel Fusaro, Emilio Olivastri, Daniele Evangelista et al.

Self-driving vehicles and autonomous ground robots require a reliable and accurate method to analyze the traversability of the surrounding environment for safe navigation. This paper proposes and evaluates a real-time machine learning-based Traversability Analysis method that combines geometric features with appearance-based features in a hybrid approach based on a SVM classifier. In particular, we show that integrating a new set of geometric and visual features and focusing on important implementation details enables a noticeable boost in performance and reliability. The proposed approach has been compared with state-of-the-art Deep Learning approaches on a public dataset of outdoor driving scenarios. It reaches an accuracy of 89.2% in scenarios of varying complexity, demonstrating its effectiveness and robustness. The method runs fully on CPU and reaches comparable results with respect to the other methods, operates faster, and requires fewer hardware resources.

CVOct 14, 2024Code
Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation

Daniel Fusaro, Simone Mosco, Emanuele Menegatti et al.

Semantic segmentation of point clouds is an essential task for understanding the environment in autonomous driving and robotics. Recent range-based works achieve real-time efficiency, while point- and voxel-based methods produce better results but are affected by high computational complexity. Moreover, highly complex deep learning models are often not suited to efficiently learn from small datasets. Their generalization capabilities can easily be driven by the abundance of data rather than the architecture design. In this paper, we harness the information from the three-dimensional representation to proficiently capture local features, while introducing the range image representation to incorporate additional information and facilitate fast computation. A GPU-based KDTree allows for rapid building, querying, and enhancing projection with straightforward operations. Extensive experiments on SemanticKITTI and nuScenes datasets demonstrate the benefits of our modification in a ``small data'' setup, in which only one sequence of the dataset is used to train the models, but also in the conventional setup, where all sequences except one are used for training. We show that a reduced version of our model not only demonstrates strong competitiveness against full-scale state-of-the-art models but also operates in real-time, making it a viable choice for real-world case applications. The code of our method is available at https://github.com/Bender97/WaffleAndRange.

CVSep 13, 2025Code
Point-Plane Projections for Accurate LiDAR Semantic Segmentation in Small Data Scenarios

Simone Mosco, Daniel Fusaro, Wanmeng Li et al.

LiDAR point cloud semantic segmentation is essential for interpreting 3D environments in applications such as autonomous driving and robotics. Recent methods achieve strong performance by exploiting different point cloud representations or incorporating data from other sensors, such as cameras or external datasets. However, these approaches often suffer from high computational complexity and require large amounts of training data, limiting their generalization in data-scarce scenarios. In this paper, we improve the performance of point-based methods by effectively learning features from 2D representations through point-plane projections, enabling the extraction of complementary information while relying solely on LiDAR data. Additionally, we introduce a geometry-aware technique for data augmentation that aligns with LiDAR sensor properties and mitigates class imbalance. We implemented and evaluated our method that applies point-plane projections onto multiple informative 2D representations of the point cloud. Experiments demonstrate that this approach leads to significant improvements in limited-data scenarios, while also achieving competitive results on two publicly available standard datasets, as SemanticKITTI and PandaSet. The code of our method is available at https://github.com/SiMoM0/3PNet

ROSep 7, 2020Code
Receding Horizon Task and Motion Planning in Changing Environments

Nicola Castaman, Enrico Pagello, Emanuele Menegatti et al.

Complex manipulation tasks require careful integration of symbolic reasoning and motion planning. This problem, commonly referred to as Task and Motion Planning (TAMP), is even more challenging if the workspace is non-static, e.g. due to human interventions and perceived with noisy non-ideal sensors. This work proposes an online approximated TAMP method that combines a geometric reasoning module and a motion planner with a standard task planner in a receding horizon fashion. Our approach iteratively solves a reduced planning problem over a receding window of a limited number of future actions during the implementation of the actions. Thus, only the first action of the horizon is actually scheduled at each iteration, then the window is moved forward, and the problem is solved again. This procedure allows to naturally take into account potential changes in the scene while ensuring good runtime performance. We validate our approach within extensive experiments in a simulated environment. We showed that our approach is able to deal with unexpected changes in the environment while ensuring comparable performance with respect to other recent TAMP approaches in solving traditional static benchmarks. We release with this paper the open-source implementation of our method.

CVOct 17, 2017Code
Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks

Marco Carraro, Matteo Munaro, Jeff Burke et al.

This paper proposes a novel system to estimate and track the 3D poses of multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D pose of each person is computed by a central node which receives the single-view outcomes from each camera of the network. Each single-view outcome is computed by using a CNN for 2D pose estimation and extending the resulting skeletons to 3D by means of the sensor depth. The proposed system is marker-less, multi-person, independent of background and does not make any assumption on people appearance and initial pose. The system provides real-time outcomes, thus being perfectly suited for applications requiring user interaction. Experimental results show the effectiveness of this work with respect to a baseline multi-view approach in different scenarios. To foster research and applications based on this work, we released the source code in OpenPTrack, an open source project for RGB-D people tracking.

23.0ROMay 4
Adaptive Gait Generation for Multi-Terrain Exoskeletons via Constrained Kernelized Movement Primitives

Edoardo Trombin, Miroljub Mihailovic, Matheus Henrique Ferreira Moura et al.

Lower limb exoskeletons (LLEs) present the potential to make motor-impaired individuals walk again. Their application in real-world environments is still limited by the lack of effective adaptive gait planning. Indeed, current exoskeletons are meant to walk only on a flat and even terrain. Generating environment-aware, physiologically consistent gait trajectories in real-time is an open challenge. To overcome this, we propose a novel Kernelized Movement Primitives (KMP)-based framework for adaptive gait generation (AGG) across multiple indoor terrains. The proposed approach learns a probabilistic representation of human gait in both the joint and task spaces from a limited number of human demonstrations, representing natural gait characteristics and ensuring kinematic feasibility. In addition, the learned trajectories are adapted using environmental information extracted from an onboard RGB-D camera by treating the AGG as a linearly constrained optimization problem with via-points. The proposed method has been thoroughly validated first in simulations for gait generation in different scenarios, such as flat-ground walking, slopes, stairs, and obstacles crossing. Finally, the effectiveness and robustness of the method have been demonstrated with experiments on a commercial LLE in real-world scenarios. The results obtained demonstrate the feasibility of an environment-aware gait planning system for a new generation of intelligent lower limb exoskeletons for assisting people with disabilities in their every-day life.

23.9ROApr 28
GEGLU-Transformer for IMU-to-EMG Estimation with Few-Shot Adaptation

Miroljub Mihailovic, Luca Tonin, Stefano Tortora et al.

Reliable estimation of neuromuscular activation is a key enabler for adaptive and personalized control in wearable robotics. However, surface electromyography (EMG) remains difficult to deploy robustly outside laboratory settings due to electrode sensitivity, signal non-stationarity, and strong subject dependence. In this work, we propose an adaptive IMU-to-EMG learning framework that reconstructs continuous muscle activation envelopes from wearable inertial measurements across heterogeneous movement conditions. The approach combines a Transformer encoder with Gaussian Error Gated Linear Units (GEGLU-Transformer) to enhance cross-subject generalization and enable rapid subject-specific personalization. Under a strict leave-one-subject-out (LOSO) protocol on a multi-condition lower-limb biomechanics dataset, the proposed architecture achieves r = 0.706 +/- 0.139 and R^2 = 0.474 +/- 0.208 without subject-specific adaptation. With only 0.5% adaptation data, performance increases to r = 0.761 +/- 0.030 and R^2 = 0.559 +/- 0.047, demonstrating rapid adaptation and early performance saturation. These results support attention-based architectures combined with lightweight adaptation as a practical and scalable alternative to direct EMG sensing for real-world wearable robotic applications.

ROApr 19, 2024
Show and Grasp: Few-shot Semantic Segmentation for Robot Grasping through Zero-shot Foundation Models

Leonardo Barcellona, Alberto Bacchin, Matteo Terreran et al.

The ability of a robot to pick an object, known as robot grasping, is crucial for several applications, such as assembly or sorting. In such tasks, selecting the right target to pick is as essential as inferring a correct configuration of the gripper. A common solution to this problem relies on semantic segmentation models, which often show poor generalization to unseen objects and require considerable time and massive data to be trained. To reduce the need for large datasets, some grasping pipelines exploit few-shot semantic segmentation models, which are capable of recognizing new classes given a few examples. However, this often comes at the cost of limited performance and fine-tuning is required to be effective in robot grasping scenarios. In this work, we propose to overcome all these limitations by combining the impressive generalization capability reached by foundation models with a high-performing few-shot classifier, working as a score function to select the segmentation that is closer to the support set. The proposed model is designed to be embedded in a grasp synthesis pipeline. The extensive experiments using one or five examples show that our novel approach overcomes existing performance limitations, improving the state of the art both in few-shot semantic segmentation on the Graspnet-1B (+10.5% mIoU) and Ocid-grasp (+1.6% AP) datasets, and real-world few-shot grasp synthesis (+21.7% grasp accuracy). The project page is available at: https://leobarcellona.github.io/showandgrasp.github.io/

CVFeb 2, 2021
Learning to Segment Human Body Parts with Synthetically Trained Deep Convolutional Networks

Alessandro Saviolo, Matteo Bonotto, Daniele Evangelista et al.

This paper presents a new framework for human body part segmentation based on Deep Convolutional Neural Networks trained using only synthetic data. The proposed approach achieves cutting-edge results without the need of training the models with real annotated data of human body parts. Our contributions include a data generation pipeline, that exploits a game engine for the creation of the synthetic data used for training the network, and a novel pre-processing module, that combines edge response maps and adaptive histogram equalization to guide the network to learn the shape of the human body parts ensuring robustness to changes in the illumination conditions. For selecting the best candidate architecture, we perform exhaustive tests on manually annotated images of real human body limbs. We further compare our method against several high-end commercial segmentation tools on the body parts segmentation task. The results show that our method outperforms the other models by a significant margin. Finally, we present an ablation study to validate our pre-processing module. With this paper, we release an implementation of the proposed approach along with the acquired datasets.

LGDec 27, 2019
Quaternion Equivariant Capsule Networks for 3D Point Clouds

Yongheng Zhao, Tolga Birdal, Jan Eric Lenssen et al.

We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations, as well as invariant to permutations of the input points. The operator receives a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end transformation equivariance through a novel dynamic routing procedure on quaternions. Further, we theoretically connect dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving \emph{iterative re-weighted least squares} (IRLS) problems with provable convergence properties. It is shown that such group dynamic routing can be interpreted as robust IRLS rotation averaging on capsule votes, where information is routed based on the final inlier scores. Based on our operator, we build a capsule network that disentangles geometry from pose, paving the way for more informative descriptors and a structured latent space. Our architecture allows joint object classification and orientation estimation without explicit supervision of rotations. We validate our algorithm empirically on common benchmark datasets.

CVJul 28, 2019
Real-time Tracking-by-Detection of Human Motion in RGB-D Camera Networks

Alessandro Malaguti, Marco Carraro, Mattia Guidolin et al.

This paper presents a novel real-time tracking system capable of improving body pose estimation algorithms in distributed camera networks. The first stage of our approach introduces a linear Kalman filter operating at the body joints level, used to fuse single-view body poses coming from different detection nodes of the network and to ensure temporal consistency between them. The second stage, instead, refines the Kalman filter estimates by fitting a hierarchical model of the human body having constrained link sizes in order to ensure the physical consistency of the tracking. The effectiveness of the proposed approach is demonstrated through a broad experimental validation, performed on a set of sequences whose ground truth references are generated by a commercial marker-based motion capture system. The obtained results show how the proposed system outperforms the considered state-of-the-art approaches, granting accurate and reliable estimates. Moreover, the developed methodology constrains neither the number of persons to track, nor the number, position, synchronization, frame-rate, and manufacturer of the RGB-D cameras used. Finally, the real-time performances of the system are of paramount importance for a large number of real-world applications.

ROMay 28, 2019
Fast human motion prediction for human-robot collaboration with wearable interfaces

Stefano Tortora, Stefano Michieletto, Francesca Stival et al.

In this paper, we aim at improving human motion prediction during human-robot collaboration in industrial facilities by exploiting contributions from both physical and physiological signals. Improved human-machine collaboration could prove useful in several areas, while it is crucial for interacting robots to understand human movement as soon as possible to avoid accidents and injuries. In this perspective, we propose a novel human-robot interface capable to anticipate the user intention while performing reaching movements on a working bench in order to plan the action of a collaborative robot. The proposed interface can find many applications in the Industry 4.0 framework, where autonomous and collaborative robots will be an essential part of innovative facilities. A motion intention prediction and a motion direction prediction levels have been developed to improve detection speed and accuracy. A Gaussian Mixture Model (GMM) has been trained with IMU and EMG data following an evidence accumulation approach to predict reaching direction. Novel dynamic stopping criteria have been proposed to flexibly adjust the trade-off between early anticipation and accuracy according to the application. The output of the two predictors has been used as external inputs to a Finite State Machine (FSM) to control the behaviour of a physical robot according to user's action or inaction. Results show that our system outperforms previous methods, achieving a real-time classification accuracy of $94.3\pm2.9\%$ after $160.0msec\pm80.0msec$ from movement onset.

RODec 18, 2018
Proceedings of the Workshop on Social Robots in Therapy: Focusing on Autonomy and Ethical Challenges

Pablo G. Esteban, Daniel Hernández García, Hee Rin Lee et al.

Robot-Assisted Therapy (RAT) has successfully been used in HRI research by including social robots in health-care interventions by virtue of their ability to engage human users both social and emotional dimensions. Research projects on this topic exist all over the globe in the USA, Europe, and Asia. All of these projects have the overall ambitious goal to increase the well-being of a vulnerable population. Typical work in RAT is performed using remote controlled robots; a technique called Wizard-of-Oz (WoZ). The robot is usually controlled, unbeknownst to the patient, by a human operator. However, WoZ has been demonstrated to not be a sustainable technique in the long-term. Providing the robots with autonomy (while remaining under the supervision of the therapist) has the potential to lighten the therapists burden, not only in the therapeutic session itself but also in longer-term diagnostic tasks. Therefore, there is a need for exploring several degrees of autonomy in social robots used in therapy. Increasing the autonomy of robots might also bring about a new set of challenges. In particular, there will be a need to answer new ethical questions regarding the use of robots with a vulnerable population, as well as a need to ensure ethically-compliant robot behaviours. Therefore, in this workshop we want to gather findings and explore which degree of autonomy might help to improve health-care interventions and how we can overcome the ethical challenges inherent to it.

RODec 5, 2017
Brain-Computer Interface meets ROS: A robotic approach to mentally drive telepresence robots

Gloria Beraldo, Morris Antonello, Andrea Cimolato et al.

This paper shows and evaluates a novel approach to integrate a non-invasive Brain-Computer Interface (BCI) with the Robot Operating System (ROS) to mentally drive a telepresence robot. Controlling a mobile device by using human brain signals might improve the quality of life of people suffering from severe physical disabilities or elderly people who cannot move anymore. Thus, the BCI user is able to actively interact with relatives and friends located in different rooms thanks to a video streaming connection to the robot. To facilitate the control of the robot via BCI, we explore new ROS-based algorithms for navigation and obstacle avoidance, making the system safer and more reliable. In this regard, the robot can exploit two maps of the environment, one for localization and one for navigation, and both can be used also by the BCI user to watch the position of the robot while it is moving. As demonstrated by the experimental results, the user's cognitive workload is reduced, decreasing the number of commands necessary to complete the task and helping him/her to keep attention for longer periods of time.

RONov 23, 2017
RUR53: an Unmanned Ground Vehicle for Navigation, Recognition and Manipulation

Nicola Castaman, Elisa Tosello, Morris Antonello et al.

This paper proposes RUR53: an Unmanned Ground Vehicle able to autonomously navigate through, identify, and reach areas of interest; and there recognize, localize, and manipulate work tools to perform complex manipulation tasks. The proposed contribution includes a modular software architecture where each module solves specific sub-tasks and that can be easily enlarged to satisfy new requirements. Included indoor and outdoor tests demonstrate the capability of the proposed system to autonomously detect a target object (a panel) and precisely dock in front of it while avoiding obstacles. They show it can autonomously recognize and manipulate target work tools (i.e., wrenches and valve stems) to accomplish complex tasks (i.e., use a wrench to rotate a valve stem). A specific case study is described where the proposed modular architecture lets easy switch to a semi-teleoperated mode. The paper exhaustively describes description of both the hardware and software setup of RUR53, its performance when tests at the 2017 Mohamed Bin Zayed International Robotics Challenge, and the lessons we learned when participating at this competition, where we ranked third in the Gran Challenge in collaboration with the Czech Technical University in Prague, the University of Pennsylvania, and the University of Lincoln (UK).

ROMar 9, 2017
Fast and Robust Detection of Fallen People from a Mobile Robot

Morris Antonello, Marco Carraro, Marco Pierobon et al.

This paper deals with the problem of detecting fallen people lying on the floor by means of a mobile robot equipped with a 3D depth sensor. In the proposed algorithm, inspired by semantic segmentation techniques, the 3D scene is over-segmented into small patches. Fallen people are then detected by means of two SVM classifiers: the first one labels each patch, while the second one captures the spatial relations between them. This novel approach showed to be robust and fast. Indeed, thanks to the use of small patches, fallen people in real cluttered scenes with objects side by side are correctly detected. Moreover, the algorithm can be executed on a mobile robot fitted with a standard laptop making it possible to exploit the 2D environmental map built by the robot and the multiple points of view obtained during the robot navigation. Additionally, this algorithm is robust to illumination changes since it does not rely on RGB data but on depth data. All the methods have been thoroughly validated on the IASLAB-RGBD Fallen Person Dataset, which is published online as a further contribution. It consists of several static and dynamic sequences with 15 different people and 2 different environments.

CVJan 20, 2017
Robust Intrinsic and Extrinsic Calibration of RGB-D Cameras

Filippo Basso, Emanuele Menegatti, Alberto Pretto

Color-depth cameras (RGB-D cameras) have become the primary sensors in most robotics systems, from service robotics to industrial robotics applications. Typical consumer-grade RGB-D cameras are provided with a coarse intrinsic and extrinsic calibration that generally does not meet the accuracy requirements needed by many robotics applications (e.g., highly accurate 3D environment reconstruction and mapping, high precision object recognition and localization, ...). In this paper, we propose a human-friendly, reliable and accurate calibration framework that enables to easily estimate both the intrinsic and extrinsic parameters of a general color-depth sensor couple. Our approach is based on a novel two components error model. This model unifies the error sources of RGB-D pairs based on different technologies, such as structured-light 3D cameras and time-of-flight cameras. Our method provides some important advantages compared to other state-of-the-art systems: it is general (i.e., well suited for different types of sensors), based on an easy and stable calibration protocol, provides a greater calibration accuracy, and has been implemented within the ROS robotics framework. We report detailed experimental validations and performance comparisons to support our statements.