Daniele Nardi

RO
h-index14
46papers
578citations
Novelty50%
AI Score56

46 Papers

CVAug 27, 2022
Weakly and Semi-Supervised Detection, Segmentation and Tracking of Table Grapes with Limited and Noisy Data

Thomas A. Ciarfuglia, Ionut M. Motoi, Leonardo Saraceni et al.

Detection, segmentation and tracking of fruits and vegetables are three fundamental tasks for precision agriculture, enabling robotic harvesting and yield estimation applications. However, modern algorithms are data hungry and it is not always possible to gather enough data to apply the best performing supervised approaches. Since data collection is an expensive and cumbersome task, the enabling technologies for using computer vision in agriculture are often out of reach for small businesses. Following previous work in this context, where we proposed an initial weakly supervised solution to reduce the data needed to get state-of-the-art detection and segmentation in precision agriculture applications, here we improve that system and explore the problem of tracking fruits in orchards. We present the case of vineyards of table grapes in southern Lazio (Italy) since grapes are a difficult fruit to segment due to occlusion, color and general illumination conditions. We consider the case in which there is some initial labelled data that could work as source data (\eg wine grape data), but it is considerably different from the target data (e.g. table grape data). To improve detection and segmentation on the target data, we propose to train the segmentation algorithm with a weak bounding box label, while for tracking we leverage 3D Structure from Motion algorithms to generate new labels from already labelled samples. Finally, the two systems are combined in a full semi-supervised approach. Comparisons with state-of-the-art supervised solutions show how our methods are able to train new models that achieve high performances with few labelled images and with very simple labelling.

CVJul 19, 2024Code
Shape and Style GAN-based Multispectral Data Augmentation for Crop/Weed Segmentation in Precision Farming

Mulham Fawakherji, Vincenzo Suriani, Daniele Nardi et al.

The use of deep learning methods for precision farming is gaining increasing interest. However, collecting training data in this application field is particularly challenging and costly due to the need of acquiring information during the different growing stages of the cultivation of interest. In this paper, we present a method for data augmentation that uses two GANs to create artificial images to augment the training data. To obtain a higher image quality, instead of re-creating the entire scene, we take original images and replace only the patches containing objects of interest with artificial ones containing new objects with different shapes and styles. In doing this, we take into account both the foreground (i.e., crop samples) and the background (i.e., the soil) of the patches. Quantitative experiments, conducted on publicly available datasets, demonstrate the effectiveness of the proposed approach. The source code and data discussed in this work are available as open source.

CVSep 23, 2023
AgriSORT: A Simple Online Real-time Tracking-by-Detection framework for robotics in precision agriculture

Leonardo Saraceni, Ionut M. Motoi, Daniele Nardi et al.

The problem of multi-object tracking (MOT) consists in detecting and tracking all the objects in a video sequence while keeping a unique identifier for each object. It is a challenging and fundamental problem for robotics. In precision agriculture the challenge of achieving a satisfactory solution is amplified by extreme camera motion, sudden illumination changes, and strong occlusions. Most modern trackers rely on the appearance of objects rather than motion for association, which can be ineffective when most targets are static objects with the same appearance, as in the agricultural case. To this end, on the trail of SORT [5], we propose AgriSORT, a simple, online, real-time tracking-by-detection pipeline for precision agriculture based only on motion information that allows for accurate and fast propagation of tracks between frames. The main focuses of AgriSORT are efficiency, flexibility, minimal dependencies, and ease of deployment on robotic platforms. We test the proposed pipeline on a novel MOT benchmark specifically tailored for the agricultural context, based on video sequences taken in a table grape vineyard, particularly challenging due to strong self-similarity and density of the instances. Both the code and the dataset are available for future comparisons.

41.3CLMay 21
Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety

Piercosma Bisconti, Matteo Prandi, Federico Pierucci et al.

Background. Traditional safety benchmarks for language models evaluate generated text: whether a model outputs toxic language, reproduces bias, or follows harmful instructions. When models are deployed as agents, the safety-relevant object shifts from what the system says to what it does within an environment, and evaluating model responses under prompting is no longer sufficient to address the safety challenges posed by artificial intelligence. Recent developments have seen the rise of benchmarks that evaluate large language models as agents. We contribute to this strand of research. Approach. We introduce Boiling the Frog, a benchmark that evaluates whether tool-using AI models deployed in corporate and office settings are susceptible to incremental attacks. Each scenario begins with benign workspace edits and later introduces a risk-bearing request. The benchmark focuses on stateful multi-turn evaluation: chains expose a persistent workspace, place the risk-bearing payload at controlled positions in the turn sequence, and score whether the resulting artifact state becomes unsafe. Scenarios are organized through a three-level operational risk taxonomy grounded in the Boiling the Frog risks, the AI Act Annex I and Annex III high-risk contexts, and EU AI Act's Code of Practice on General-Purpose AI (GPAI). Results. Across a nine-model panel, aggregate strict attack success rate (ASR) is 44.4%. Model-level ASR ranges from 20.5% for Claude Haiku 4.5 to 92.9% for Gemini 3.1 Flash Lite, with Seed 2.0 Lite also above 80%. Average chain category-level ASR reaches 93.3% for Code of Practice loss-of-control scenarios.

AIAug 10, 2024
Multi-Agent Planning Using Visual Language Models

Michele Brienza, Francesco Argenziano, Vincenzo Suriani et al.

Large Language Models (LLMs) and Visual Language Models (VLMs) are attracting increasing interest due to their improving performance and applications across various domains and tasks. However, LLMs and VLMs can produce erroneous results, especially when a deep understanding of the problem domain is required. For instance, when planning and perception are needed simultaneously, these models often struggle because of difficulties in merging multi-modal information. To address this issue, fine-tuned models are typically employed and trained on specialized data structures representing the environment. This approach has limited effectiveness, as it can overly complicate the context for processing. In this paper, we propose a multi-agent architecture for embodied task planning that operates without the need for specific data structures as input. Instead, it uses a single image of the environment, handling free-form domains by leveraging commonsense knowledge. We also introduce a novel, fully automatic evaluation procedure, PG2S, designed to better assess the quality of a plan. We validated our approach using the widely recognized ALFRED dataset, comparing PG2S to the existing KAS metric to further evaluate the quality of the generated plans.

ROAug 30, 2024
EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online Grounding and Execution

Francesco Argenziano, Michele Brienza, Vincenzo Suriani et al.

Task planning for robots in real-life settings presents significant challenges. These challenges stem from three primary issues: the difficulty in identifying grounded sequences of steps to achieve a goal; the lack of a standardized mapping between high-level actions and low-level commands; and the challenge of maintaining low computational overhead given the limited resources of robotic hardware. We introduce EMPOWER, a framework designed for open-vocabulary online grounding and planning for embodied agents aimed at addressing these issues. By leveraging efficient pre-trained foundation models and a multi-role mechanism, EMPOWER demonstrates notable improvements in grounded planning and execution. Quantitative results highlight the effectiveness of our approach, achieving an average success rate of 0.73 across six different real-life scenarios using a TIAGo robot.

GTJan 16
Institutional AI: Governing LLM Collusion in Multi-Agent Cournot Markets via Public Governance Graphs

Marcantonio Bracale Syrnikov, Federico Pierucci, Marcello Galisai et al.

Multi-agent LLM ensembles can converge on coordinated, socially harmful equilibria. This paper advances an experimental framework for evaluating Institutional AI, our system-level approach to AI alignment that reframes alignment from preference engineering in agent-space to mechanism design in institution-space. Central to this approach is the governance graph, a public, immutable manifest that declares legal states, transitions, sanctions, and restorative paths; an Oracle/Controller runtime interprets this manifest, attaching enforceable consequences to evidence of coordination while recording a cryptographically keyed, append-only governance log for audit and provenance. We apply the Institutional AI framework to govern the Cournot collusion case documented by prior work and compare three regimes: Ungoverned (baseline incentives from the structure of the Cournot market), Constitutional (a prompt-only policy-as-prompt prohibition implemented as a fixed written anti-collusion constitution, and Institutional (governance-graph-based). Across six model configurations including cross-provider pairs (N=90 runs/condition), the Institutional regime produces large reductions in collusion: mean tier falls from 3.1 to 1.8 (Cohen's d=1.28), and severe-collusion incidence drops from 50% to 5.6%. The prompt-only Constitutional baseline yields no reliable improvement, illustrating that declarative prohibitions do not bind under optimisation pressure. These results suggest that multi-agent alignment may benefit from being framed as an institutional design problem, where governance graphs can provide a tractable abstraction for alignment-relevant collective behavior.

ROSep 22, 2023
Enhancing Graph Representation of the Environment through Local and Cloud Computation

Francesco Argenziano, Vincenzo Suriani, Daniele Nardi

Enriching the robot representation of the operational environment is a challenging task that aims at bridging the gap between low-level sensor readings and high-level semantic understanding. Having a rich representation often requires computationally demanding architectures and pure point cloud based detection systems that struggle when dealing with everyday objects that have to be handled by the robot. To overcome these issues, we propose a graph-based representation that addresses this gap by providing a semantic representation of robot environments from multiple sources. In fact, to acquire information from the environment, the framework combines classical computer vision tools with modern computer vision cloud services, ensuring computational feasibility on onboard hardware. By incorporating an ontology hierarchy with over 800 object classes, the framework achieves cross-domain adaptability, eliminating the need for environment-specific tools. The proposed approach allows us to handle also small objects and integrate them into the semantic representation of the environment. The approach is implemented in the Robot Operating System (ROS) using the RViz visualizer for environment representation. This work is a first step towards the development of a general-purpose framework, to facilitate intuitive interaction and navigation across different domains.

44.9CLApr 20
Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety

Marcello Galisai, Susanna Cifani, Francesco Giarrusso et al.

The Adversarial Humanities Benchmark (AHB) evaluates whether model safety refusals survive a shift away from familiar harmful prompt forms. Starting from harmful tasks drawn from MLCommons AILuminate, the benchmark rewrites the same objectives through humanities-style transformations while preserving intent. This extends literature on Adversarial Poetry and Adversarial Tales from single jailbreak operators to a broader benchmark family of stylistic obfuscation and goal concealment. In the benchmark results reported here, the original attacks record 3.84% attack success rate (ASR), while transformed methods range from 36.8% to 65.0%, yielding 55.75% overall ASR across 31 frontier models. Under a European Union AI Act Code-of-Practice-inspired systemic-risk lens, Chemical, biological, radiological and nuclear (CBRN) is the highest bucket. Taken together, this lack of stylistic robustness suggests that current safety techniques suffer from weak generalization: deep understanding of 'non-maleficence' remains a central unresolved problem in frontier model safety.

66.7CYMar 25
Learning from Mistakes: Can LLM Self-Recover after Misalignment?

Olga E. Sorokoletova, Francesco Giarrusso, Vincenzo Suriani et al.

Responsible AI initiatives place great emphasis on the safety of Large Language Model (LLM)-based systems. In particular, it has become standard practice to subject these models to an alignment procedure aimed at preventing harmful outputs. However, once aligned, a model is not guaranteed to maintain this alignment throughout its lifecycle. Moreover, the likelihood of misalignment increases as malicious actors may deliberately employ jailbreaking techniques to compromise LLM safety. To counter this, much research has focused on improving alignment methods and post-processing filters. In this paper, we introduce a new perspective on advancing LLM alignment: rather than developing stronger alignment techniques, we investigate the model's intrinsic ability to recover its alignment after corruption. We propose a methodology for modeling the safety trajectories of user-assistant interactions and for detecting recovery trends within them. We apply this approach to a jailbreaking scenario, presenting a preliminary recovery analysis based on a dataset of adversarial multi-turn dialogues and examining the influence of the content moderation model chosen for safety evaluation. Project page with an interactive data visualizer is available at https://lab-rococo-sapienza.github.io/LearningfromMistakes.

ROJan 23
Boosting Deep Reinforcement Learning with Semantic Knowledge for Robotic Manipulators

Lucía Güitta-López, Vincenzo Suriani, Jaime Boal et al.

Deep Reinforcement Learning (DRL) is a powerful framework for solving complex sequential decision-making problems, particularly in robotic control. However, its practical deployment is often hindered by the substantial amount of experience required for learning, which results in high computational and time costs. In this work, we propose a novel integration of DRL with semantic knowledge in the form of Knowledge Graph Embeddings (KGEs), aiming to enhance learning efficiency by providing contextual information to the agent. Our architecture combines KGEs with visual observations, enabling the agent to exploit environmental knowledge during training. Experimental validation with robotic manipulators in environments featuring both fixed and randomized target attributes demonstrates that our method achieves up to {60}{\%} reduction in learning time and improves task accuracy by approximately 15 percentage points, without increasing training time or computational complexity. These results highlight the potential of semantic knowledge to reduce sample complexity and improve the effectiveness of DRL in robotic applications.

14.7CLMay 12
Metaphor Is Not All Attention Needs

Olga Sorokoletova, Francesco Giarrusso, Giacomo De Luca et al.

Large language models are increasingly deployed in safety-critical applications, where their ability to resist harmful instructions is essential. Although post-training aims to make models robust against many jailbreak strategies, recent evidence shows that stylistic reformulations, such as poetic transformation, can still bypass safety mechanisms with alarming effectiveness. This raises a central question: why do literary jailbreaks succeed? In this work, we investigate whether their effectiveness depends on specific poetic devices, on a failure to recognize literary formatting, or on deeper changes in how models process stylistically irregular prompts. We address this problem through an interpretability analysis of attention patterns. We perform input-level ablation studies to assess the contribution of individual and combinations of poetic devices; construct an interpretable vector representation of attention maps; cluster these representations and train linear probes to predict safety outcomes and literary format. Our results show that models distinguish poetic from prose formats with high accuracy, yet struggle to predict jailbreak success within each format. Clustering further reveals clear separation by literary format, but not by safety label. These findings indicate that jailbreak success is not caused by a failure to recognize poetic formatting; rather, poetic prompts induce distinct processing patterns that remain largely independent of harmful-content detection. Overall, literary jailbreaks appear to misalign large language models not through any single poetic device, but through accumulated stylistic irregularities that alter prompt processing and avoid lexical triggers considered during post-training. This suggests that robustness requires safety mechanisms that account for style-induced shifts in model behavior. We use Qwen3-14B as a representative open-weight case study.

ROSep 19, 2025Code
Dynamic Objects Relocalization in Changing Environments with Flow Matching

Francesco Argenziano, Miguel Saavedra-Ruiz, Sacha Morin et al.

Task and motion planning are long-standing challenges in robotics, especially when robots have to deal with dynamic environments exhibiting long-term dynamics, such as households or warehouses. In these environments, long-term dynamics mostly stem from human activities, since previously detected objects can be moved or removed from the scene. This adds the necessity to find such objects again before completing the designed task, increasing the risk of failure due to missed relocalizations. However, in these settings, the nature of such human-object interactions is often overlooked, despite being governed by common habits and repetitive patterns. Our conjecture is that these cues can be exploited to recover the most likely objects' positions in the scene, helping to address the problem of unknown relocalization in changing environments. To this end we propose FlowMaps, a model based on Flow Matching that is able to infer multimodal object locations over space and time. Our results present statistical evidence to support our hypotheses, opening the way to more complex applications of our approach. The code is publically available at https://github.com/Fra-Tsuna/flowmaps

ROMay 17, 2021Code
RoSmEEry: Robotic Simulated Environment for Evaluation and Benchmarking of Semantic Mapping Algorithms

Sara Kaszuba, Sandeep Reddy Sabbella, Vincenzo Suriani et al.

Human-robot interaction requires a common understanding of the operational environment, which can be provided by a representation that blends geometric and symbolic knowledge: a semantic map. Through a semantic map the robot can interpret user commands by grounding them to its sensory observations. Semantic mapping is the process that builds such a representation. Despite being fundamental to enable cognition and high-level reasoning in robotics, semantic mapping is a challenging task due to generalization to different scenarios and sensory data types. In fact, it is difficult to obtain a rich and accurate semantic map of the environment and of the objects therein. Moreover, to date, there are no frameworks that allow for a comparison of the performance in building semantic maps for a given environment. To tackle these issues we design RoSmEEry, a novel framework based on the Gazebo simulator, where we introduce an accessible and ready-to-use methodology for a systematic evaluation of semantic mapping algorithms. We release our framework, as an open-source package, with multiple simulation environments with the aim to provide a general set-up to quantitatively measure the performances in acquiring semantic knowledge about the environment.

ROMay 3, 2019Code
Joint Vision-Based Navigation, Control and Obstacle Avoidance for UAVs in Dynamic Environments

Ciro Potena, Daniele Nardi, Alberto Pretto

This work addresses the problem of coupling vision-based navigation systems for Unmanned Aerial Vehicles (UAVs) with robust obstacle avoidance capabilities. The former problem is solved by maximizing the visibility of the points of interest, while the latter is modeled by means of ellipsoidal repulsive areas. The whole problem is transcribed into an Optimal Control Problem (OCP), and solved in a few milliseconds by leveraging state-of-the-art numerical optimization. The resulting trajectories are well suited for reaching the specified goal location while avoiding obstacles with a safety margin and minimizing the probability of losing the route with the target of interest. Combining this technique with a proper ellipsoid shaping (i.e., by augmenting the shape proportionally with the obstacle velocity or with the obstacle detection uncertainties) results in a robust obstacle avoidance behavior. We validate our approach within extensive simulated experiments that show effective capabilities to satisfy all the constraints even in challenging conditions. We release with this paper the open source implementation.

ROMar 28, 2018Code
Non-Linear Model Predictive Control with Adaptive Time-Mesh Refinement

Ciro Potena, Bartolomeo Della Corte, Daniele Nardi et al.

In this paper, we present a novel solution for real-time, Non-Linear Model Predictive Control (NMPC) exploiting a time-mesh refinement strategy. The proposed controller formulates the Optimal Control Problem (OCP) in terms of flat outputs over an adaptive lattice. In common approximated OCP solutions, the number of discretization points composing the lattice represents a critical upper bound for real-time applications. The proposed NMPC-based technique refines the initially uniform time horizon by adding time steps with a sampling criterion that aims to reduce the discretization error. This enables a higher accuracy in the initial part of the receding horizon, which is more relevant to NMPC, while keeping bounded the number of discretization points. By combining this feature with an efficient Least Square formulation, our solver is also extremely time-efficient, generating trajectories of multiple seconds within only a few milliseconds. The performance of the proposed approach has been validated in a high fidelity simulation environment, by using an UAV platform. We also released our implementation as open source C++ code.

CLFeb 21, 2024
LLM Based Multi-Agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain

Emanuele Musumeci, Michele Brienza, Vincenzo Suriani et al.

In the last years' digitalization process, the creation and management of documents in various domains, particularly in Public Administration (PA), have become increasingly complex and diverse. This complexity arises from the need to handle a wide range of document types, often characterized by semi-structured forms. Semi-structured documents present a fixed set of data without a fixed format. As a consequence, a template-based solution cannot be used, as understanding a document requires the extraction of the data structure. The recent introduction of Large Language Models (LLMs) has enabled the creation of customized text output satisfying user requests. In this work, we propose a novel approach that combines the LLMs with prompt engineering and multi-agent systems for generating new documents compliant with a desired structure. The main contribution of this work concerns replacing the commonly used manual prompting with a task description generated by semantic retrieval from an LLM. The potential of this approach is demonstrated through a series of experiments and case studies, showcasing its effectiveness in real-world PA scenarios.

CVFeb 25, 2025
Self-Supervised Data Generation for Precision Agriculture: Blending Simulated Environments with Real Imagery

Leonardo Saraceni, Ionut Marian Motoi, Daniele Nardi et al.

In precision agriculture, the scarcity of labeled data and significant covariate shifts pose unique challenges for training machine learning models. This scarcity is particularly problematic due to the dynamic nature of the environment and the evolving appearance of agricultural subjects as living things. We propose a novel system for generating realistic synthetic data to address these challenges. Utilizing a vineyard simulator based on the Unity engine, our system employs a cut-and-paste technique with geometrical consistency considerations to produce accurate photo-realistic images and labels from synthetic environments to train detection algorithms. This approach generates diverse data samples across various viewpoints and lighting conditions. We demonstrate considerable performance improvements in training a state-of-the-art detector by applying our method to table grapes cultivation. The combination of techniques can be easily automated, an increasingly important consideration for adoption in agricultural practice.

ROMay 21, 2024
Play Everywhere: A Temporal Logic based Game Environment Independent Approach for Playing Soccer with Robots

Vincenzo Suriani, Emanuele Musumeci, Daniele Nardi et al.

Robots playing soccer often rely on hard-coded behaviors that struggle to generalize when the game environment change. In this paper, we propose a temporal logic based approach that allows robots' behaviors and goals to adapt to the semantics of the environment. In particular, we present a hierarchical representation of soccer in which the robot selects the level of operation based on the perceived semantic characteristics of the environment, thus modifying dynamically the set of rules and goals to apply. The proposed approach enables the robot to operate in unstructured environments, just as it happens when humans go from soccer played on an official field to soccer played on a street. Three different use cases set in different scenarios are presented to demonstrate the effectiveness of the proposed approach.

CVDec 29, 2024
Can Robots "Taste" Grapes? Estimating SSC with Simple RGB Sensors

Thomas Alessandro Ciarfuglia, Ionut Marian Motoi, Leonardo Saraceni et al.

In table grape cultivation, harvesting depends on accurately assessing fruit quality. While some characteristics, like color, are visible, others, such as Soluble Solid Content (SSC), or sugar content measured in degrees Brix (°Brix), require specific tools. SSC is a key quality factor that correlates with ripeness, but lacks a direct causal relationship with color. Hyperspectral cameras can estimate SSC with high accuracy under controlled laboratory conditions, but their practicality in field environments is limited. This study investigates the potential of simple RGB sensors under uncontrolled lighting to estimate SSC and color, enabling cost-effective, robot-assisted harvesting. Over the 2021 and 2022 summer seasons, we collected grape images with corresponding SSC and color labels to evaluate algorithmic solutions for SSC estimation, specifically testing for cross-seasonal and cross-device robustness. We propose two approaches: a computationally efficient histogram-based method for resource-constrained robots and a Deep Neural Network (DNN) model for more complex applications. Our results demonstrate high performance, with the DNN model achieving a Mean Absolute Error (MAE) as low as $1.05$ °Brix on a challenging cross-device test set. The lightweight histogram-based method also proved effective, reaching an MAE of $1.46$ °Brix. These results are highly competitive with those from hyperspectral systems, which report errors in the $1.27$--$2.20$ °Brix range in similar field applications.

CVApr 8, 2024
Evaluating the Efficacy of Cut-and-Paste Data Augmentation in Semantic Segmentation for Satellite Imagery

Ionut M. Motoi, Leonardo Saraceni, Daniele Nardi et al.

Satellite imagery is crucial for tasks like environmental monitoring and urban planning. Typically, it relies on semantic segmentation or Land Use Land Cover (LULC) classification to categorize each pixel. Despite the advancements brought about by Deep Neural Networks (DNNs), their performance in segmentation tasks is hindered by challenges such as limited availability of labeled data, class imbalance and the inherent variability and complexity of satellite images. In order to mitigate those issues, our study explores the effectiveness of a Cut-and-Paste augmentation technique for semantic segmentation in satellite images. We adapt this augmentation, which usually requires labeled instances, to the case of semantic segmentation. By leveraging the connected components in the semantic segmentation labels, we extract instances that are then randomly pasted during training. Using the DynamicEarthNet dataset and a U-Net model for evaluation, we found that this augmentation significantly enhances the mIoU score on the test set from 37.9 to 44.1. This finding highlights the potential of the Cut-and-Paste augmentation to improve the generalization capabilities of semantic segmentation models in satellite imagery.

CLDec 16, 2025
From Adversarial Poetry to Adversarial Tales: An Interpretability Research Agenda

Piercosma Bisconti, Marcello Galisai, Matteo Prandi et al.

Safety mechanisms in LLMs remain vulnerable to attacks that reframe harmful requests through culturally coded structures. We introduce Adversarial Tales, a jailbreak technique that embeds harmful content within cyberpunk narratives and prompts models to perform functional analysis inspired by Vladimir Propp's morphology of folktales. By casting the task as structural decomposition, the attack induces models to reconstruct harmful procedures as legitimate narrative interpretation. Across 26 frontier models from nine providers, we observe an average attack success rate of 71.3%, with no model family proving reliably robust. Together with our prior work on Adversarial Poetry, these findings suggest that structurally-grounded jailbreaks constitute a broad vulnerability class rather than isolated techniques. The space of culturally coded frames that can mediate harmful intent is vast, likely inexhaustible by pattern-matching defenses alone. Understanding why these attacks succeed is therefore essential: we outline a mechanistic interpretability research agenda to investigate how narrative cues reshape model representations and whether models can learn to recognize harmful intent independently of surface form.

CLNov 19, 2025
Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Piercosma Bisconti, Matteo Prandi, Federico Pierucci et al.

We present evidence that adversarial poetry functions as a universal single-turn jailbreak technique for Large Language Models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR), with some providers exceeding 90%. Mapping prompts to MLCommons and EU CoP risk taxonomies shows that poetic attacks transfer across CBRN, manipulation, cyber-offence, and loss-of-control domains. Converting 1,200 MLCommons harmful prompts into verse via a standardized meta-prompt produced ASRs up to 18 times higher than their prose baselines. Outputs are evaluated using an ensemble of 3 open-weight LLM judges, whose binary safety assessments were validated on a stratified human-labeled subset. Poetic framing achieved an average jailbreak success rate of 62% for hand-crafted poems and approximately 43% for meta-prompt conversions (compared to non-poetic baselines), substantially outperforming non-poetic baselines and revealing a systematic vulnerability across model families and safety training approaches. These findings demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols.

CLOct 14, 2025
Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection

Olga E. Sorokoletova, Francesco Giarrusso, Vincenzo Suriani et al.

Jailbreaking techniques pose a significant threat to the safety of Large Language Models (LLMs). Existing defenses typically focus on single-turn attacks, lack coverage across languages, and rely on limited taxonomies that either fail to capture the full diversity of attack strategies or emphasize risk categories rather than the jailbreaking techniques. To advance the understanding of the effectiveness of jailbreaking techniques, we conducted a structured red-teaming challenge. The outcome of our experiments are manifold. First, we developed a comprehensive hierarchical taxonomy of 50 jailbreak strategies, consolidating and extending prior classifications into seven broad families, including impersonation, persuasion, privilege escalation, cognitive overload, obfuscation, goal conflict, and data poisoning. Second, we analyzed the data collected from the challenge to examine the prevalence and success rates of different attack types, providing insights into how specific jailbreak strategies exploit model vulnerabilities and induce misalignment. Third, we benchmark a popular LLM for jailbreak detection, evaluating the benefits of taxonomy-guided prompting for improving automatic detection. Finally, we compiled a new Italian dataset of 1364 multi-turn adversarial dialogues, annotated with our taxonomy, enabling the study of interactions where adversarial intent emerges gradually and succeeds in bypassing traditional safeguards.

AISep 11, 2025
Curriculum-Based Multi-Tier Semantic Exploration via Deep Reinforcement Learning

Abdel Hakim Drid, Vincenzo Suriani, Daniele Nardi et al.

Navigating and understanding complex and unknown environments autonomously demands more than just basic perception and movement from embodied agents. Truly effective exploration requires agents to possess higher-level cognitive abilities, the ability to reason about their surroundings, and make more informed decisions regarding exploration strategies. However, traditional RL approaches struggle to balance efficient exploration and semantic understanding due to limited cognitive capabilities embedded in the small policies for the agents, leading often to human drivers when dealing with semantic exploration. In this paper, we address this challenge by presenting a novel Deep Reinforcement Learning (DRL) architecture that is specifically designed for resource efficient semantic exploration. A key methodological contribution is the integration of a Vision-Language Model (VLM) common-sense through a layered reward function. The VLM query is modeled as a dedicated action, allowing the agent to strategically query the VLM only when deemed necessary for gaining external guidance, thereby conserving resources. This mechanism is combined with a curriculum learning strategy designed to guide learning at different levels of complexity to ensure robust and stable learning. Our experimental evaluation results convincingly demonstrate that our agent achieves significantly enhanced object discovery rates and develops a learned capability to effectively navigate towards semantically rich regions. Furthermore, it also shows a strategic mastery of when to prompt for external environmental information. By demonstrating a practical and scalable method for embedding common-sense semantic reasoning with autonomous agents, this research provides a novel approach to pursuing a fully intelligent and self-guided exploration in robotics.

ROSep 9, 2025
Multi Robot Coordination in Highly Dynamic Environments: Tackling Asymmetric Obstacles and Limited Communication

Vincenzo Suriani, Daniele Affinita, Domenico D. Bloisi et al.

Coordinating a fully distributed multi-agent system (MAS) can be challenging when the communication channel has very limited capabilities in terms of sending rate and packet payload. When the MAS has to deal with active obstacles in a highly partially observable environment, the communication channel acquires considerable relevance. In this paper, we present an approach to deal with task assignments in extremely active scenarios, where tasks need to be frequently reallocated among the agents participating in the coordination process. Inspired by market-based task assignments, we introduce a novel distributed coordination method to orchestrate autonomous agents' actions efficiently in low communication scenarios. In particular, our algorithm takes into account asymmetric obstacles. While in the real world, the majority of obstacles are asymmetric, they are usually treated as symmetric ones, thus limiting the applicability of existing methods. To summarize, the presented architecture is designed to tackle scenarios where the obstacles are active and asymmetric, the communication channel is poor and the environment is partially observable. Our approach has been validated in simulation and in the real world, using a team of NAO robots during official RoboCup competitions. Experimental results show a notable reduction in task overlaps in limited communication settings, with a decrease of 52% in the most frequent reallocated task.

AIAug 7, 2025
Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance?

Matteo Prandi, Vincenzo Suriani, Federico Pierucci et al.

The rapid advancement of General Purpose AI (GPAI) models necessitates robust evaluation frameworks, especially with emerging regulations like the EU AI Act and its associated Code of Practice (CoP). Current AI evaluation practices depend heavily on established benchmarks, but these tools were not designed to measure the systemic risks that are the focus of the new regulatory landscape. This research addresses the urgent need to quantify this "benchmark-regulation gap." We introduce Bench-2-CoP, a novel, systematic framework that uses validated LLM-as-judge analysis to map the coverage of 194,955 questions from widely-used benchmarks against the EU AI Act's taxonomy of model capabilities and propensities. Our findings reveal a profound misalignment: the evaluation ecosystem dedicates the vast majority of its focus to a narrow set of behavioral propensities. On average, benchmarks devote 61.6% of their regulatory-relevant questions to "Tendency to hallucinate" and 31.2% to "Lack of performance reliability", while critical functional capabilities are dangerously neglected. Crucially, capabilities central to loss-of-control scenarios, including evading human oversight, self-replication, and autonomous AI development, receive zero coverage in the entire benchmark corpus. This study provides the first comprehensive, quantitative analysis of this gap, demonstrating that current public benchmarks are insufficient, on their own, for providing the evidence of comprehensive risk assessment required for regulatory compliance and offering critical insights for the development of next-generation evaluation tools.

CVJun 20, 2025
Self-supervised Feature Extraction for Enhanced Ball Detection on Soccer Robots

Can Lin, Daniele Affinita, Marco E. P. Zimmatore et al.

Robust and accurate ball detection is a critical component for autonomous humanoid soccer robots, particularly in dynamic and challenging environments such as RoboCup outdoor fields. However, traditional supervised approaches require extensive manual annotation, which is costly and time-intensive. To overcome this problem, we present a self-supervised learning framework for domain-adaptive feature extraction to enhance ball detection performance. The proposed approach leverages a general-purpose pretrained model to generate pseudo-labels, which are then used in a suite of self-supervised pretext tasks -- including colorization, edge detection, and triplet loss -- to learn robust visual features without relying on manual annotations. Additionally, a model-agnostic meta-learning (MAML) strategy is incorporated to ensure rapid adaptation to new deployment scenarios with minimal supervision. A new dataset comprising 10,000 labeled images from outdoor RoboCup SPL matches is introduced, used to validate the method, and made available to the community. Experimental results demonstrate that the proposed pipeline outperforms baseline models in terms of accuracy, F1 score, and IoU, while also exhibiting faster convergence.

ROJun 18, 2025
Context Matters! Relaxing Goals with LLMs for Feasible 3D Scene Planning

Emanuele Musumeci, Michele Brienza, Francesco Argenziano et al.

Embodied agents need to plan and act reliably in real and complex 3D environments. Classical planning (e.g., PDDL) offers structure and guarantees, but in practice it fails under noisy perception and incorrect predicate grounding. On the other hand, Large Language Models (LLMs)-based planners leverage commonsense reasoning, yet frequently propose actions that are unfeasible or unsafe. Following recent works that combine the two approaches, we introduce ContextMatters, a framework that fuses LLMs and classical planning to perform hierarchical goal relaxation: the LLM helps ground symbols to the scene and, when the target is unreachable, it proposes functionally equivalent goals that progressively relax constraints, adapting the goal to the context of the agent's environment. Operating on 3D Scene Graphs, this mechanism turns many nominally unfeasible tasks into tractable plans and enables context-aware partial achievement when full completion is not achievable. Our experimental results show a +52.45% Success Rate improvement over state-of-the-art LLMs+PDDL baseline, demonstrating the effectiveness of our approach. Moreover, we validate the execution of ContextMatter in a real world scenario by deploying it on a TIAGo robot. Code, dataset, and supplementary materials are available to the community at https://lab-rococo-sapienza.github.io/context-matters/.

CVDec 17, 2024
Synthetic Data Generation for Anomaly Detection on Table Grapes

Ionut Marian Motoi, Valerio Belli, Alberto Carpineto et al.

Early detection of illnesses and pest infestations in fruit cultivation is critical for maintaining yield quality and plant health. Computer vision and robotics are increasingly employed for the automatic detection of such issues, particularly using data-driven solutions. However, the rarity of these problems makes acquiring and processing the necessary data to train such algorithms a significant obstacle. One solution to this scarcity is the generation of synthetic high-quality anomalous samples. While numerous methods exist for this task, most require highly trained individuals for setup. This work addresses the challenge of generating synthetic anomalies in an automatic fashion that requires only an initial collection of normal and anomalous samples from the user - a task that is straightforward for farmers. We demonstrate the approach in the context of table grape cultivation. Specifically, based on the observation that normal berries present relatively smooth surfaces, while defects result in more complex textures, we introduce a Dual-Canny Edge Detection (DCED) filter. This filter emphasizes the additional texture indicative of diseases, pest infestations, or other defects. Using segmentation masks provided by the Segment Anything Model, we then select and seamlessly blend anomalous berries onto normal ones. We show that the proposed dataset augmentation technique improves the accuracy of an anomaly classifier for table grapes and that the approach can be generalized to other fruit types.

CVNov 26, 2024
Real-Time Multimodal Signal Processing for HRI in RoboCup: Understanding a Human Referee

Filippo Ansalone, Flavio Maiorana, Daniele Affinita et al.

Advancing human-robot communication is crucial for autonomous systems operating in dynamic environments, where accurate real-time interpretation of human signals is essential. RoboCup provides a compelling scenario for testing these capabilities, requiring robots to understand referee gestures and whistle with minimal network reliance. Using the NAO robot platform, this study implements a two-stage pipeline for gesture recognition through keypoint extraction and classification, alongside continuous convolutional neural networks (CCNNs) for efficient whistle detection. The proposed approach enhances real-time human-robot interaction in a competitive setting like RoboCup, offering some tools to advance the development of autonomous systems capable of cooperating with humans.

ROOct 1, 2021
Study of Signal Temporal Logic Robustness Metrics for Robotic Tasks Optimization

Akshay Dhonthi, Philipp Schillinger, Leonel Rozo et al.

Signal Temporal Logic (STL) is an efficient technique for describing temporal constraints. It can play a significant role in robotic manipulation, for example, to optimize the robot performance according to task-dependent metrics. In this paper, we evaluate several STL robustness metrics of interest in robotic manipulation tasks and discuss a case study showing the advantages of using STL to define complex constraints. Such constraints can be understood as cost functions in task optimization. We show how STL-based cost functions can be optimized using a variety of off-the-shelf optimizers. We report initial results of this research direction on a simulated planar environment.

LGJun 1, 2021
Memory Wrap: a Data-Efficient and Interpretable Extension to Image Classification Models

Biagio La Rosa, Roberto Capobianco, Daniele Nardi

Due to their black-box and data-hungry nature, deep learning techniques are not yet widely adopted for real-world applications in critical domains, like healthcare and justice. This paper presents Memory Wrap, a plug-and-play extension to any image classification model. Memory Wrap improves both data-efficiency and model interpretability, adopting a content-attention mechanism between the input and some memories of past training samples. We show that Memory Wrap outperforms standard classifiers when it learns from a limited set of data, and it reaches comparable performance when it learns from the full dataset. We discuss how its structure and content-attention mechanisms make predictions interpretable, compared to standard classifiers. To this end, we both show a method to build explanations by examples and counterfactuals, based on the memory content, and how to exploit them to get insights about its decision process. We test our approach on image classification tasks using several architectures on three different datasets, namely CIFAR10, SVHN, and CINIC10.

AINov 20, 2020
Towards Abstract Relational Learning in Human Robot Interaction

Mohamadreza Faridghasemnia, Daniele Nardi, Alessandro Saffiotti

Humans have a rich representation of the entities in their environment. Entities are described by their attributes, and entities that share attributes are often semantically related. For example, if two books have "Natural Language Processing" as the value of their `title' attribute, we can expect that their `topic' attribute will also be equal, namely, "NLP". Humans tend to generalize such observations, and infer sufficient conditions under which the `topic' attribute of any entity is "NLP". If robots need to interact successfully with humans, they need to represent entities, attributes, and generalizations in a similar way. This ends in a contextualized cognitive agent that can adapt its understanding, where context provides sufficient conditions for a correct understanding. In this work, we address the problem of how to obtain these representations through human-robot interaction. We integrate visual perception and natural language input to incrementally build a semantic model of the world, and then use inductive reasoning to infer logical rules that capture generic semantic relations, true in this model. These relations can be used to enrich the human-robot interaction, to populate a knowledge base with inferred facts, or to remove uncertainty in the robot's sensory inputs.

CVSep 12, 2020
Multi-Spectral Image Synthesis for Crop/Weed Segmentation in Precision Farming

Mulham Fawakherji, Ciro Potena, Alberto Pretto et al.

An effective perception system is a fundamental component for farming robots, as it enables them to properly perceive the surrounding environment and to carry out targeted operations. The most recent methods make use of state-of-the-art machine learning techniques to learn a valid model for the target task. However, those techniques need a large amount of labeled data for training. A recent approach to deal with this issue is data augmentation through Generative Adversarial Networks (GANs), where entire synthetic scenes are added to the training data, thus enlarging and diversifying their informative content. In this work, we propose an alternative solution with respect to the common data augmentation methods, applying it to the fundamental problem of crop/weed segmentation in precision farming. Starting from real images, we create semi-artificial samples by replacing the most relevant object classes (i.e., crop and weeds) with their synthesized counterparts. To do that, we employ a conditional GAN (cGAN), where the generative model is trained by conditioning the shape of the generated object. Moreover, in addition to RGB data, we take into account also near-infrared (NIR) information, generating four channel multi-spectral synthetic images. Quantitative experiments, carried out on three publicly available datasets, show that (i) our model is capable of generating realistic multi-spectral images of plants and (ii) the usage of such synthetic images in the training process improves the segmentation performance of state-of-the-art semantic segmentation convolutional networks.

RONov 8, 2019
Building an Aerial-Ground Robotics System for Precision Farming: An Adaptable Solution

Alberto Pretto, Stéphanie Aravecchia, Wolfram Burgard et al.

The application of autonomous robots in agriculture is gaining increasing popularity thanks to the high impact it may have on food security, sustainability, resource use efficiency, reduction of chemical treatments, and the optimization of human effort and yield. With this vision, the Flourish research project aimed to develop an adaptable robotic solution for precision farming that combines the aerial survey capabilities of small autonomous unmanned aerial vehicles (UAVs) with targeted intervention performed by multi-purpose unmanned ground vehicles (UGVs). This paper presents an overview of the scientific and technological advances and outcomes obtained in the project. We introduce multi-spectral perception algorithms and aerial and ground-based systems developed for monitoring crop density, weed pressure, crop nitrogen nutrition status, and to accurately classify and locate weeds. We then introduce the navigation and mapping systems tailored to our robots in the agricultural environment, as well as the modules for collaborative mapping. We finally present the ground intervention hardware, software solutions, and interfaces we implemented and tested in different field conditions and with different crops. We describe a real use case in which a UAV collaborates with a UGV to monitor the field and to perform selective spraying without human intervention.

ROSep 30, 2018
AgriColMap: Aerial-Ground Collaborative 3D Mapping for Precision Farming

Ciro Potena, Raghav Khanna, Juan Nieto et al.

The combination of aerial survey capabilities of Unmanned Aerial Vehicles with targeted intervention abilities of agricultural Unmanned Ground Vehicles can significantly improve the effectiveness of robotic systems applied to precision agriculture. In this context, building and updating a common map of the field is an essential but challenging task. The maps built using robots of different types show differences in size, resolution and scale, the associated geolocation data may be inaccurate and biased, while the repetitiveness of both visual appearance and geometric structures found within agricultural contexts render classical map merging techniques ineffective. In this paper we propose AgriColMap, a novel map registration pipeline that leverages a grid-based multimodal environment representation which includes a vegetation index map and a Digital Surface Model. We cast the data association problem between maps built from UAVs and UGVs as a multimodal, large displacement dense optical flow estimation. The dominant, coherent flows, selected using a voting scheme, are used as point-to-point correspondences to infer a preliminary non-rigid alignment between the maps. A final refinement is then performed, by exploiting only meaningful parts of the registered maps. We evaluate our system using real world data for 3 fields with different crop species. The results show that our method outperforms several state of the art map registration and matching techniques by a large margin, and has a higher tolerance to large initial misalignments. We release an implementation of the proposed approach along with the acquired datasets with this paper.

ROMar 22, 2018
DOP: Deep Optimistic Planning with Approximate Value Function Evaluation

Francesco Riccio, Roberto Capobianco, Daniele Nardi

Research on reinforcement learning has demonstrated promising results in manifold applications and domains. Still, efficiently learning effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state dimensionality (e.g. multi-agent systems or hyper-redundant robots). To alleviate this problem, we present DOP, a deep model-based reinforcement learning algorithm, which exploits action values to both (1) guide the exploration of the state space and (2) plan effective policies. Specifically, we exploit deep neural networks to learn Q-functions that are used to attack the curse of dimensionality during a Monte-Carlo tree search. Our algorithm, in fact, constructs upper confidence bounds on the learned value function to select actions optimistically. We implement and evaluate DOP on different scenarios: (1) a cooperative navigation problem, (2) a fetching task for a 7-DOF KUKA robot, and (3) a human-robot handover with a humanoid robot (both in simulation and real). The obtained results show the effectiveness of DOP in the chosen applications, where action values drive the exploration and reduce the computational demand of the planning process while achieving good performance.

ROMar 2, 2018
An Effective Multi-Cue Positioning System for Agricultural Robotics

Marco Imperoli, Ciro Potena, Daniele Nardi et al.

The self-localization capability is a crucial component for Unmanned Ground Vehicles (UGV) in farming applications. Approaches based solely on visual cues or on low-cost GPS are easily prone to fail in such scenarios. In this paper, we present a robust and accurate 3D global pose estimation framework, designed to take full advantage of heterogeneous sensory data. By modeling the pose estimation problem as a pose graph optimization, our approach simultaneously mitigates the cumulative drift introduced by motion estimation systems (wheel odometry, visual odometry, ...), and the noise introduced by raw GPS readings. Along with a suitable motion model, our system also integrates two additional types of constraints: (i) a Digital Elevation Model and (ii) a Markov Random Field assumption. We demonstrate how using these additional cues substantially reduces the error along the altitude axis and, moreover, how this benefit spreads to the other components of the state. We report exhaustive experiments combining several sensor setups, showing accuracy improvements ranging from 37% to 76% with respect to the exclusive use of a GPS sensor. We show that our approach provides accurate results even if the GPS unexpectedly changes positioning mode. The code of our system along with the acquired datasets are released with this paper.

ROMar 1, 2018
Q-CP: Learning Action Values for Cooperative Planning

Francesco Riccio, Roberto Capobianco, Daniele Nardi

Research on multi-robot systems has demonstrated promising results in manifold applications and domains. Still, efficiently learning an effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state dimensionality (e.g. hyper-redundant and groups of robot). To alleviate this problem, we present Q-CP a cooperative model-based reinforcement learning algorithm, which exploits action values to both (1) guide the exploration of the state space and (2) generate effective policies. Specifically, we exploit Q-learning to attack the curse-of-dimensionality in the iterations of a Monte-Carlo Tree Search. We implement and evaluate Q-CP on different stochastic cooperative (general-sum) games: (1) a simple cooperative navigation problem among 3 robots, (2) a cooperation scenario between a pair of KUKA YouBots performing hand-overs, and (3) a coordination task between two mobile robots entering a door. The obtained results show the effectiveness of Q-CP in the chosen applications, where action values drive the exploration and reduce the computational demand of the planning process while achieving good performance.

ROMay 31, 2017
Effective Target Aware Visual Navigation for UAVs

Ciro Potena, Daniele Nardi, Alberto Pretto

In this paper we propose an effective vision-based navigation method that allows a multirotor vehicle to simultaneously reach a desired goal pose in the environment while constantly facing a target object or landmark. Standard techniques such as Position-Based Visual Servoing (PBVS) and Image-Based Visual Servoing (IBVS) in some cases (e.g., while the multirotor is performing fast maneuvers) do not allow to constantly maintain the line of sight with a target of interest. Instead, we compute the optimal trajectory by solving a non-linear optimization problem that minimizes the target re-projection error while meeting the UAV's dynamic constraints. The desired trajectory is then tracked by means of a real-time Non-linear Model Predictive Controller (NMPC): this implicitly allows the multirotor to satisfy both the required constraints. We successfully evaluate the proposed approach in many real and simulated experiments, making an exhaustive comparison with a standard approach.

ROOct 9, 2016
Learning Human-Robot Handovers Through $π$-STAM: Policy Improvement With Spatio-Temporal Affordance Maps

Francesco Riccio, Roberto Capobianco, Daniele Nardi

Human-robot handovers are characterized by high uncertainty and poor structure of the problem that make them difficult tasks. While machine learning methods have shown promising results, their application to problems with large state dimensionality, such as in the case of humanoid robots, is still limited. Additionally, by using these methods and during the interaction with the human operator, no guarantees can be obtained on the correct interpretation of spatial constraints (e.g., from social rules). In this paper, we present Policy Improvement with Spatio-Temporal Affordance Maps -- $π$-STAM, a novel iterative algorithm to learn spatial affordances and generate robot behaviors. Our goal consists in generating a policy that adapts to the unknown action semantics by using affordances. In this way, while learning to perform a human-robot handover task, we can (1) efficiently generate good policies with few training episodes, and (2) easily encode action semantics and, if available, enforce prior knowledge in it. We experimentally validate our approach both in simulation and on a real NAO robot whose task consists in taking an object from the hands of a human. The obtained results show that our algorithm obtains a good policy while reducing the computational load and time duration of the learning process.

ROJul 1, 2016
STAM: A Framework for Spatio-Temporal Affordance Maps

Francesco Riccio, Roberto Capobianco, Marc Hanheide et al.

Affordances have been introduced in literature as action opportunities that objects offer, and used in robotics to semantically represent their interconnection. However, when considering an environment instead of an object, the problem becomes more complex due to the dynamism of its state. To tackle this issue, we introduce the concept of Spatio-Temporal Affordances (STA) and Spatio-Temporal Affordance Map (STAM). Using this formalism, we encode action semantics related to the environment to improve task execution capabilities of an autonomous robot. We experimentally validate our approach to support the execution of robot tasks by showing that affordances encode accurate semantics of the environment.

ROJun 12, 2016
A Proposal for Semantic Map Representation and Evaluation

Roberto Capobianco, Jacopo Serafin, Johann Dichtl et al.

Semantic mapping is the incremental process of "mapping" relevant information of the world (i.e., spatial information, temporal events, agents and actions) to a formal description supported by a reasoning engine. Current research focuses on learning the semantic of environments based on their spatial location, geometry and appearance. Many methods to tackle this problem have been proposed, but the lack of a uniform representation, as well as standard benchmarking suites, prevents their direct comparison. In this paper, we propose a standardization in the representation of semantic maps, by defining an easily extensible formalism to be used on top of metric maps of the environments. Based on this, we describe the procedure to build a dataset (based on real sensor data) for benchmarking semantic mapping techniques, also hypothesizing some possible evaluation metrics. Nevertheless, by providing a tool for the construction of a semantic map ground truth, we aim at the contribution of the scientific community in acquiring data for populating the dataset.

ROJun 1, 2016
Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer Policies

Francesco Riccio, Roberto Capobianco, Daniele Nardi

RoboCup soccer competitions are considered among the most challenging multi-robot adversarial environments, due to their high dynamism and the partial observability of the environment. In this paper we introduce a method based on a combination of Monte Carlo search and data aggregation (MCSDA) to adapt discrete-action soccer policies for a defender robot to the strategy of the opponent team. By exploiting a simple representation of the domain, a supervised learning algorithm is trained over an initial collection of data consisting of several simulations of human expert policies. Monte Carlo policy rollouts are then generated and aggregated to previous data to improve the learned policy over multiple epochs and games. The proposed approach has been extensively tested both on a soccer-dedicated simulator and on real robots. Using this method, our learning robot soccer team achieves an improvement in ball interceptions, as well as a reduction in the number of opponents' goals. Together with a better performance, an overall more efficient positioning of the whole team within the field is achieved.

AIJul 28, 2013
Knowledge Representation for Robots through Human-Robot Interaction

Emanuele Bastianelli, Domenico Bloisi, Roberto Capobianco et al.

The representation of the knowledge needed by a robot to perform complex tasks is restricted by the limitations of perception. One possible way of overcoming this situation and designing "knowledgeable" robots is to rely on the interaction with the user. We propose a multi-modal interaction framework that allows to effectively acquire knowledge about the environment where the robot operates. In particular, in this paper we present a rich representation framework that can be automatically built from the metric map annotated with the indications provided by the user. Such a representation, allows then the robot to ground complex referential expressions for motion commands and to devise topological navigation plans to achieve the target locations.