ROJun 3, 2022
One-shot Learning for Autonomous Aerial ManipulationClaudio Zito, Eliseo Ferrante
This paper is concerned with learning transferable contact models for aerial manipulation tasks. We investigate a contact-based approach for enabling unmanned aerial vehicles with cable-suspended passive grippers to compute the attach points on novel payloads for aerial transportation. This is the first time that the problem of autonomously generating contact points for such tasks has been investigated. Our approach builds on the underpinning idea that we can learn a probability density of contacts over objects' surfaces from a single demonstration. We enhance this formulation for encoding aerial transportation tasks while maintaining the one-shot learning paradigm without handcrafting task-dependent features or employing ad-hoc heuristics; the only prior is extrapolated directly from a single demonstration. Our models only rely on the geometrical properties of the payloads computed from a point cloud, and they are robust to partial views. The effectiveness of our approach is evaluated in simulation, in which one or three quadropters are requested to transport previously unseen payloads along a desired trajectory. The contact points and the quadroptors configurations are computed on-the-fly for each test by our apporach and compared with a baseline method, a modified grasp learning algorithm from the literature. Empirical experiments show that the contacts generated by our approach yield a better controllability of the payload for a transportation task. We conclude this paper with a discussion on the strengths and limitations of the presented idea, and our suggested future research directions.
18.7ROMar 24
LiZIP: An Auto-Regressive Compression Framework for LiDAR Point CloudsAditya Shibu, Kayvan Karim, Claudio Zito
The massive volume of data generated by LiDAR sensors in autonomous vehicles creates a bottleneck for real-time processing and vehicle-to-everything (V2X) transmission. Existing lossless compression methods often force a trade-off: industry standard algorithms (e.g., LASzip) lack adaptability, while deep learning approaches suffer from prohibitive computational costs. This paper proposes LiZIP, a lightweight, near-lossless zero-drift compression framework based on neural predictive coding. By utilizing a compact Multi-Layer Perceptron (MLP) to predict point coordinates from local context, LiZIP efficiently encodes only the sparse residuals. We evaluate LiZIP on the NuScenes and Argoverse datasets, benchmarking against GZip, LASzip, and Google Draco (configured with 24-bit quantization to serve as a high-precision geometric baseline). Results demonstrate that LiZIP consistently achieves superior compression ratios across varying environments. The proposed system achieves a 7.5%-14.8% reduction in file size compared to the industry-standard LASzip and outperforms Google Draco by 8.8%-11.3% across diverse datasets. Furthermore, the system demonstrates generalization capabilities on the unseen Argoverse dataset without retraining. Against the general purpose GZip algorithm, LiZIP achieves a reduction of 38%-48%. This efficiency offers a distinct advantage for bandwidth constrained V2X applications and large scale cloud archival.
ROMar 14, 2023
Robot Grasping and Manipulation: A ProspectiveClaudio Zito
``A simple handshake would give them away''. This is how Anthony Hopkins' fictional character, Dr Robert Ford, summarises a particular flaw of the 2016 science-fiction \emph{Westworld}'s hosts. In the storyline, Westworld is a futuristic theme park and the hosts are autonomous robots engineered to be indistinguishable from the human guests, except for their hands that have not been perfected yet. In another classic science-fiction saga, scientists unlock the secrets of full synthetic intelligence, Skynet, by reverse engineering a futuristic hand. In both storylines, reality inspires fiction on one crucial point: designing hands and reproducing robust and reliable manipulation actions is one of the biggest challenges in robotics. Solving this problem would lead us to a new, improved era of autonomy. A century ago, the third industrial revolution brought robots into the assembly lines, changing our way of working forever. The next revolution has already started by bringing us artificial intelligence (AI) assistants, enhancing our quality of life in our jobs and everyday lives--even combating worldwide pandemics.
48.0LGMay 8
Reflective Prompted Policy Optimization: Trajectory-Grounded Revision and Salience BiasRahaf Abu Hara, Vaibbhav Murarri, Claudio Zito
Existing LLM-based policy optimizers see only scalar rewards: that a policy scored 0.45, but not whether the agent got stuck in a loop, fell into a hole on the third step, or performed well on 19 out of 20 rollouts and failed catastrophically on one. We propose Reflective Prompted Policy Optimization (R2PO), a two-stage LLM framework for policy search over compact policy classes that augments scalar reward feedback with trajectory-level behavioral evidence. A Search-LLM proposes candidate policy parameters; the environment executes them; a Critic-LLM inspects the resulting rollouts and proposes targeted revisions grounded in observed states, actions, and rewards. Across ten environments, ablations show R2PO's gains require separating global search from behavior-grounded revision and using selection to filter high-variance edits. We further identify a dominant failure mode, salience bias: when presented with multiple rollouts, the Critic-LLM fixates on improving a single failure even when most trajectories succeed. In a three-trajectory variant where the Critic-LLM sees the best, worst, and median rollout, this behavior explains 76.6% of regressions on CartPole. R2PO mitigates this by reasoning over aggregate rollout statistics, median-trajectory selection, and a revision rule. Using a 20B open-weight model, R2PO achieves the highest mean best reward across all ten environments, reaches near-optimal performance substantially earlier (e.g., near-maximum CartPole reward within ~500 episodes), and trains far more stably than both deep RL and prior LLM-based methods. These results show that treating trajectories as first-class in-context evidence, rather than artifacts reduced to scalar returns, changes how even comparatively small LLMs search over policy spaces, enabling them to learn faster, diagnose more precisely, and reliably improve external controllers.
NEJan 13, 2022
Direct Mutation and Crossover in Genetic Algorithms Applied to Reinforcement Learning TasksTarek Faycal, Claudio Zito
Neuroevolution has recently been shown to be quite competitive in reinforcement learning (RL) settings, and is able to alleviate some of the drawbacks of gradient-based approaches. This paper will focus on applying neuroevolution using a simple genetic algorithm (GA) to find the weights of a neural network that produce optimally behaving agents. In addition, we present two novel modifications that improve the data efficiency and speed of convergence when compared to the initial implementation. The modifications are evaluated on the FrozenLake environment provided by OpenAI gym and prove to be significantly better than the baseline approach.
LGJan 12, 2022
Dyna-T: Dyna-Q and Upper Confidence Bounds Applied to TreesTarek Faycal, Claudio Zito
In this work we present a preliminary investigation of a novel algorithm called Dyna-T. In reinforcement learning (RL) a planning agent has its own representation of the environment as a model. To discover an optimal policy to interact with the environment, the agent collects experience in a trial and error fashion. Experience can be used for learning a better model or improve directly the value function and policy. Typically separated, Dyna-Q is an hybrid approach which, at each iteration, exploits the real experience to update the model as well as the value function, while planning its action using simulated data from its model. However, the planning process is computationally expensive and strongly depends on the dimensionality of the state-action space. We propose to build a Upper Confidence Tree (UCT) on the simulated experience and search for the best action to be selected during the on-line learning process. We prove the effectiveness of our proposed method on a set of preliminary tests on three testbed environments from Open AI. In contrast to Dyna-Q, Dyna-T outperforms state-of-the-art RL agents in the stochastic environments by choosing a more robust action selection strategy.
ROJan 11, 2022
Multi-Hypothesis Scan Matching through ClusteringGiorgio Iavicoli, Claudio Zito
Graph-SLAM is a well-established algorithm for constructing a topological map of the environment while simultaneously attempting the localisation of the robot. It relies on scan matching algorithms to align noisy observations along robot's movements to compute an estimate of the current robot's location. We propose a fundamentally different approach to scan matching tasks to improve the estimation of roto-translation displacements and therefore the performance of the full SLAM algorithm. A Monte-Carlo approach is used to generate weighted hypotheses of the geometrical displacement between two scans, and then we cluster these hypotheses to compute the displacement that results in the best alignment. To cope with clusterization on roto-translations, we propose a novel clustering approach that robustly extends Gaussian Mean-Shift to orientations by factorizing the kernel density over the roto-translation components. We demonstrate the effectiveness of our method in an extensive set of experiments using both synthetic data and the Intel Research Lab's benchmarking datasets. The results confirms that our approach has superior performance in terms of matching accuracy and runtime computation than the state-of-the-art iterative point-based scan matching algorithms.
CVJan 4, 2022
Underwater Object Classification and Detection: first results and open challengesAndre Jesus, Claudio Zito, Claudio Tortorici et al.
This work reviews the problem of object detection in underwater environments. We analyse and quantify the shortcomings of conventional state-of-the-art (SOTA) algorithms in the computer vision community when applied to this challenging environment, as well as providing insights and general guidelines for future research efforts. First, we assessed if pretraining with the conventional ImageNet is beneficial when the object detector needs to be applied to environments that may be characterised by a different feature distribution. We then investigate whether two-stage detectors yields to better performance with respect to single-stage detectors, in terms of accuracy, intersection of union (IoU), floating operation per second (FLOPS), and inference time. Finally, we assessed the generalisation capability of each model to a lower quality dataset to simulate performance on a real scenario, in which harsher conditions ought to be expected. Our experimental results provide evidence that underwater object detection requires searching for "ad-hoc" architectures than merely training SOTA architectures on new data, and that pretraining is not beneficial.
ROJul 29, 2020
Learning Transferable Push Manipulation Skills in Novel ContextsRhys Howard, Claudio Zito
This paper is concerned with learning transferable forward models for push manipulation that can be applying to novel contexts and how to improve the quality of prediction when critical information is available. We propose to learn a parametric internal model for push interactions that, similar for humans, enables a robot to predict the outcome of a physical interaction even in novel contexts. Given a desired push action, humans are capable to identify where to place their finger on a new object so to produce a predictable motion of the object. We achieve the same behaviour by factorising the learning into two parts. First, we learn a set of local contact models to represent the geometrical relations between the robot pusher, the object, and the environment. Then we learn a set of parametric local motion models to predict how these contacts change throughout a push. The set of contact and motion models represent our internal model. By adjusting the shapes of the distributions over the physical parameters, we modify the internal model's response. Uniform distributions yield to coarse estimates when no information is available about the novel context (i.e. unbiased predictor). A more accurate predictor can be learned for a specific environment/object pair (e.g. low friction/high mass), i.e. biased predictor. The effectiveness of our approach is shown in a simulated environment in which a Pioneer 3-DX robot needs to predict a push outcome for a novel object, and we provide a proof of concept on a real robot. We train on 2 objects (a cube and a cylinder) for a total of 24,000 pushes in various conditions, and test on 6 objects encompassing a variety of shapes, sizes, and physical parameters for a total of 14,400 predicted push outcomes. Our results show that both biased and unbiased predictors can reliably produce predictions in line with the outcomes of a carefully tuned physics simulator.
ASMar 5, 2020
Statistical Context-Dependent Units Boundary Correction for Corpus-based Unit-Selection Text-to-SpeechClaudio Zito, Fabio Tesser, Mauro Nicolao et al.
In this study, we present an innovative technique for speaker adaptation in order to improve the accuracy of segmentation with application to unit-selection Text-To-Speech (TTS) systems. Unlike conventional techniques for speaker adaptation, which attempt to improve the accuracy of the segmentation using acoustic models that are more robust in the face of the speaker's characteristics, we aim to use only context dependent characteristics extrapolated with linguistic analysis techniques. In simple terms, we use the intuitive idea that context dependent information is tightly correlated with the related acoustic waveform. We propose a statistical model, which predicts correcting values to reduce the systematic error produced by a state-of-the-art Hidden Markov Model (HMM) based speech segmentation. Our approach consists of two phases: (1) identifying context-dependent phonetic unit classes (for instance, the class which identifies vowels as being the nucleus of monosyllabic words); and (2) building a regression model that associates the mean error value made by the ASR during the segmentation of a single speaker corpus to each class. The success of the approach is evaluated by comparing the corrected boundaries of units and the state-of-the-art HHM segmentation against a reference alignment, which is supposed to be the optimal solution. In conclusion, our work supplies a first analysis of a model sensitive to speaker-dependent characteristics, robust to defective and noisy information, and a very simple implementation which could be utilized as an alternative to either more expensive speaker-adaptation systems or of numerous manual correction sessions.
ROMar 3, 2020
Aging Touch: Systematic and Unbiased Presentation of Tactile StimuliClaudio Zito
This report presents the experimental methodology and a step-by-step guide for gathering data on how aging influences tactile surface perception in decision and action. The experiments consist of a set of trials in which the ability to distinguish tactile stimuli is investigated. A robot arm is used to provide a systematic and unbiased presentation of the stimuli.
ROFeb 9, 2020
Grasping and Manipulation with a Multi-Fingered HandClaudio Zito
This thesis is concerned with deriving planning algorithms for robot manipulators. Manipulation has two effects, the robot has a physical effect on the object, and it also acquires information about the object. This thesis presents algorithms that treat both problems. First, I present an extension of the well-known piano mover's problem where a robot pushing an object must plan its movements as well as those of the object. This requires simultaneous planning in the joint space of the robot and the configuration space of the object, in contrast to the original problem which only requires planning in the latter space. The effects of a robot action on the object configuration are determined by the non-invertible rigid body mechanics. Second, I consider planning under uncertainty and in particular planning for information effects. I consider the case where a robot has to reach and grasp an object under pose uncertainty caused by shape incompleteness. The approach presented in this report is to study and possibly extend a new approach to artificial intelligence (A.I.) which has emerged in the last years in response to the necessity of building intelligent controllers for agents operating in unstructured stochastic environments. Such agents require the ability to learn by interaction with its environment an optimal action-selection behaviour. The main issue is that real-world problems are usually dynamic and unpredictable. Thus, the agent needs to update constantly its current image of the world using its sensors, which provide only a noisy description of the surrounding environment. Although there are different schools of thinking, with their own set of techniques, a brand new direction which unifies many A.I. researches is to formalise such agent/environment interactions as embedded systems with stochastic dynamics.
ROJul 18, 2019
Robust and fast generation of top and side grasps for unknown objectsBrice Denoun, Beatriz Leon, Claudio Zito et al.
In this work, we present a geometry-based grasping algorithm that is capable of efficiently generating both top and side grasps for unknown objects, using a single view RGB-D camera, and of selecting the most promising one. We demonstrate the effectiveness of our approach on a picking scenario on a real robot platform. Our approach has shown to be more reliable than another recent geometry-based method considered as baseline [7] in terms of grasp stability, by increasing the successful grasp attempts by a factor of six.
ROJun 27, 2019
Automatic Detection of Myocontrol Failures Based upon Situational Context InformationKaroline Heiwolt, Claudio Zito, Markus Nowak et al.
Myoelectric control systems for assistive devices are still unreliable. The user's input signals can become unstable over time due to e.g. fatigue, electrode displacement, or sweat. Hence, such controllers need to be constantly updated and heavily rely on user feedback. In this paper, we present an automatic failure detection method which learns when plausible predictions become unreliable and model updates are necessary. Our key insight is to enhance the control system with a set of generative models that learn sensible behaviour for a desired task from human demonstration. We illustrate our approach on a grasping scenario in Virtual Reality, in which the user is asked to grasp a bottle on a table. From demonstration our model learns the reach-to-grasp motion from a resting position to two grasps (power grasp and tridigital grasp) and how to predict the most adequate grasp from local context, e.g. tridigital grasp on the bottle cap or around the bottleneck. By measuring the error between new grasp attempts and the model prediction, the system can effectively detect which input commands do not reflect the user's intention. We evaluated our model in two cases: i) with both position and rotation information of the wrist pose, and ii) with only rotational information. Our results show that our approach detects statistically highly significant differences in error distributions with p < 0.001 between successful and failed grasp attempts in both cases.
ROJun 27, 2019
Generative grasp synthesis from demonstration using parametric mixturesErmano Arruda, Claudio Zito, Mohan Sridharan et al.
We present a parametric formulation for learning generative models for grasp synthesis from a demonstration. We cast new light on this family of approaches, proposing a parametric formulation for grasp synthesis that is computationally faster compared to related work and indicates better grasp success rate performance in simulated experiments, showing a gain of at least 10% success rate (p < 0.05) in all the tested conditions. The proposed implementation is also able to incorporate arbitrary constraints for grasp ranking that may include task-specific constraints. Results are reported followed by a brief discussion on the merits of the proposed methods noted so far.
ROJun 19, 2019
Metrics and Benchmarks for Remote Shared Controllers in Industrial ApplicationsClaudio Zito, Maxime Adjigble, Brice D. Denoun et al.
Remote manipulation is emerging as one of the key robotics tasks needed in extreme environments. Several researchers have investigated how to add AI components into shared controllers to improve their reliability. Nonetheless, the impact of novel research approaches in real-world applications can have a very slow in-take. We propose a set of benchmarks and metrics to evaluate how the AI components of remote shared control algorithms can improve the effectiveness of such frameworks for real industrial applications. We also present an empirical evaluation of a simple intelligent share controller against a manually operated manipulator in a tele-operated grasping scenario.
ROJun 19, 2019
2D Linear Time-Variant Controller for Human's Intention Detection for Reach-to-Grasp Trajectories in Novel ScenesClaudio Zito, Tomasz Deregowski, Rustam Stolkin
Designing robotic assistance devices for manipulation tasks is challenging. This work is concerned with improving accuracy and usability of semi-autonomous robots, such as human operated manipulators or exoskeletons. The key insight is to develop a system that takes into account context- and user-awareness to take better decisions in how to assist the user. The context-awareness is implemented by enabling the system to automatically generate a set of candidate grasps and reach-to-grasp trajectories in novel, cluttered scenes. The user-awareness is implemented as a linear time-variant feedback controller to facilitate the motion towards the most promising grasp. Our approach is demonstrated in a simple 2D example in which participants are asked to grasp a specific object in a clutter scene. Our approach also reduce the number of controllable dimensions for the user by providing only control on x- and y-axis, while orientation of the end-effector and the pose of its fingers are inferred by the system. The experimental results show the benefits of our approach in terms of accuracy and execution time with respect to a pure manual control.
ROMay 13, 2019
Let's Push Things Forward: A Survey on Robot PushingJochen Stüber, Claudio Zito, Rustam Stolkin
As robot make their way out of factories into human environments, outer space, and beyond, they require the skill to manipulate their environment in multifarious, unforeseeable circumstances. With this regard, pushing is an essential motion primitive that dramatically extends a robot's manipulation repertoire. In this work, we review the robotic pushing literature. While focusing on work concerned with predicting the motion of pushed objects, we also cover relevant applications of pushing for planning and control. Beginning with analytical approaches, under which we also subsume physics engines, we then proceed to discuss work on learning models from data. In doing so, we dedicate a separate section to deep learning approaches which have seen a recent upsurge in the literature. Concluding remarks and further research perspectives are given at the end of the paper.
ROMay 9, 2019
Feature-Based Transfer Learning for Robotic Push ManipulationJochen Stüber, Marek Kopicki, Claudio Zito
This paper presents a data-efficient approach to learning transferable forward models for robotic push manipulation. Our approach extends our previous work on contact-based predictors by leveraging information on the pushed object's local surface features. We test the hypothesis that, by conditioning predictions on local surface features, we can achieve generalisation across objects of different shapes. In doing so, we do not require a CAD model of the object but rather rely on a point cloud object model (PCOM). Our approach involves learning motion models that are specific to contact models. Contact models encode the contacts seen during training time and allow generating similar contacts at prediction time. Predicting on familiar ground reduces the motion models' sample complexity while using local contact information for prediction increases their transferability. In extensive experiments in simulation, our approach is capable of transfer learning for various test objects, outperforming a baseline predictor. We support those results with a proof of concept on a real robot.
ROMar 13, 2019
Hypothesis-based Belief Planning for Dexterous GraspingClaudio Zito, Valerio Ortenzi, Maxime Adjigble et al.
Belief space planning is a viable alternative to formalise partially observable control problems and, in the recent years, its application to robot manipulation problems has grown. However, this planning approach was tried successfully only on simplified control problems. In this paper, we apply belief space planning to the problem of planning dexterous reach-to-grasp trajectories under object pose uncertainty. In our framework, the robot perceives the object to be grasped on-the-fly as a point cloud and compute a full 6D, non-Gaussian distribution over the object's pose (our belief space). The system has no limitations on the geometry of the object, i.e., non-convex objects can be represented, nor assumes that the point cloud is a complete representation of the object. A plan in the belief space is then created to reach and grasp the object, such that the information value of expected contacts along the trajectory is maximised to compensate for the pose uncertainty. If an unexpected contact occurs when performing the action, such information is used to refine the pose distribution and triggers a re-planning. Experimental results show that our planner (IR3ne) improves grasp reliability and compensates for the pose uncertainty such that it doubles the proportion of grasps that succeed on a first attempt.