ROJun 5, 2023Code
Knowledge-Driven Robot Program Synthesis from Human VR DemonstrationsBenjamin Alt, Franklin Kenghagho Kenfack, Andrei Haidu et al.
Aging societies, labor shortages and increasing wage costs call for assistance robots capable of autonomously performing a wide array of real-world tasks. Such open-ended robotic manipulation requires not only powerful knowledge representations and reasoning (KR&R) algorithms, but also methods for humans to instruct robots what tasks to perform and how to perform them. In this paper, we present a system for automatically generating executable robot control programs from human task demonstrations in virtual reality (VR). We leverage common-sense knowledge and game engine-based physics to semantically interpret human VR demonstrations, as well as an expressive and general task representation and automatic path planning and code generation, embedded into a state-of-the-art cognitive architecture. We demonstrate our approach in the context of force-sensitive fetch-and-place for a robotic shopping assistant. The source code is available at https://github.com/ease-crc/vr-program-synthesis.
ROJul 15, 2022
Heuristic-free Optimization of Force-Controlled Robot Search Strategies in Stochastic EnvironmentsBenjamin Alt, Darko Katic, Rainer Jäkel et al.
In both industrial and service domains, a central benefit of the use of robots is their ability to quickly and reliably execute repetitive tasks. However, even relatively simple peg-in-hole tasks are typically subject to stochastic variations, requiring search motions to find relevant features such as holes. While search improves robustness, it comes at the cost of increased runtime: More exhaustive search will maximize the probability of successfully executing a given task, but will significantly delay any downstream tasks. This trade-off is typically resolved by human experts according to simple heuristics, which are rarely optimal. This paper introduces an automatic, data-driven and heuristic-free approach to optimize robot search strategies. By training a neural model of the search strategy on a large set of simulated stochastic environments, conditioning it on few real-world examples and inverting the model, we can infer search strategies which adapt to the time-variant characteristics of the underlying probability distributions, while requiring very few real-world measurements. We evaluate our approach on two different industrial robots in the context of spiral and probe search for THT electronics assembly.
55.1ROMay 28
LLM-Guided Future Hypotheses for Horizon-Aware Exploration in Multi-Step Robot ManipulationMohammad Khoshnazar, Andrew Melnik, Michael Beetz
Multi-step robot manipulation requires acting under uncertainty about how the scene will evolve, making exploration and policy adaptation challenging. We study whether short-horizon, task-consistent future videos can provide useful structured priors for control and reinforcement-learning fine-tuning. We formalize this idea through Future-Experience Conditioning (FEC), a simple interface that conditions closed-loop policies on a latent representation of a short future video. In our simulation setup, future clips are generated in three stages, an LLM reasoner operating over a task ontology initialized from the current scene state, a robot-free digital-twin rollout of the intended object motion, and a mask-free video diffusion model that synthesizes a robot-consistent future clip without requiring segmentation at inference. We instantiate this future-conditioning interface primarily with BC and BC+RL, and compare against a future-conditioned Streaming Flow Policy (SFP) baseline on RoboCasa and CALVIN under NoFuture, GTFuture, GenFuture, and WrongFuture. Generated futures improve performance over no-future conditioning, while mismatched futures degrade it, and our BC+RL instantiation achieves the strongest overall results. An average BC+RL learning-curve analysis across 8 CALVIN tasks further shows that GTFuture improves fastest, GenFuture improves earlier and to a higher level than NoFuture, and WrongFuture remains at zero throughout training. These results suggest that short-horizon future videos can serve as useful structured priors for exploration and policy adaptation under imperfect future predictions. https://enact2026.github.io/
LGFeb 14, 2023
Joint Probability TreesDaniel Nyga, Mareike Picklum, Tom Schierenbeck et al.
We introduce Joint Probability Trees (JPT), a novel approach that makes learning of and reasoning about joint probability distributions tractable for practical applications. JPTs support both symbolic and subsymbolic variables in a single hybrid model, and they do not rely on prior knowledge about variable dependencies or families of distributions. JPT representations build on tree structures that partition the problem space into relevant subregions that are elicited from the training data instead of postulating a rigid dependency model prior to learning. Learning and reasoning scale linearly in JPTs, and the tree structure allows white-box reasoning about any posterior probability $P(Q|E)$, such that interpretable explanations can be provided for any inference result. Our experiments showcase the practical applicability of JPTs in high-dimensional heterogeneous probability spaces with millions of training samples, making it a promising alternative to classic probabilistic graphical models.
4.0ROMar 23
Reasoning Systems for Semantic Navigation in Mobile RobotsJonathan Crespo, Ramón Barber, O. M. Mozos et al.
Semantic navigation is the navigation paradigm in which environmental semantic concepts and their relationships are taken into account to plan the route of a mobile robot. This paradigm facilitates the interaction with humans and the understanding of human environments in terms of navigation goals and tasks. At the high level, a semantic navigation system requires two main components: a semantic representation of the environment, and a reasoner system. This paper is focused on develop a model of the environment using semantic concepts. This paper presents two solutions for the semantic navigation paradigm. Both systems implement an ontological model. Whilst the first one uses a relational database, the second one is based on KnowRob. Both systems have been integrated in a semantic navigator. We compare both systems at the qualitative and quantitative levels, and present an implementation on a mobile robot as a proof of concept.
46.4ROMay 12Code
Closing the Motion Execution Gap: From Semantic Motion Task Constraints to Kinematic ControlSimon Stelter, Vanessa Hassouna, Malte Huerkamp et al.
This paper addresses the Motion Execution Gap, the disconnect between high-level symbolic task descriptions using semantic constraints and executable robot motions. Motion Statecharts are introduced as an executable symbolic representation for complex motions. They allow the arbitrary arrangement of motion constraints, monitors or nested statecharts in parallel and sequence. World-centric motion specification and generalization across embodiments are enabled through the use of a unified differentiable kinematic world model of both, robots and environments. Motion execution is realized through a lMPC-based implementation of the task-function approach, in which smooth transitions during task switches are ensured using jerk bounds. Cross-platform transferability was demonstrated by deploying the method on eight robot platforms, operating in diverse environments. The proposed framework is called Giskard and is available open source: https://github.com/cram2/cognitive_robot_abstract_machine.
ROOct 25, 2023
Translating Universal Scene Descriptions into Knowledge Graphs for Robotic EnvironmentGiang Hoang Nguyen, Daniel Bessler, Simon Stelter et al.
Robots performing human-scale manipulation tasks require an extensive amount of knowledge about their surroundings in order to perform their actions competently and human-like. In this work, we investigate the use of virtual reality technology as an implementation for robot environment modeling, and present a technique for translating scene graphs into knowledge bases. To this end, we take advantage of the Universal Scene Description (USD) format which is an emerging standard for the authoring, visualization and simulation of complex environments. We investigate the conversion of USD-based environment models into Knowledge Graph (KG) representations that facilitate semantic querying and integration with additional knowledge sources.
MLOct 6, 2023
Integrating Transformations in Probabilistic CircuitsTom Schierenbeck, Vladimir Vutov, Thorsten Dickhaus et al.
This study addresses the predictive limitation of probabilistic circuits and introduces transformations as a remedy to overcome it. We demonstrate this limitation in robotic scenarios. We motivate that independent component analysis is a sound tool to preserve the independence properties of probabilistic circuits. Our approach is an extension of joint probability trees, which are model-free deterministic circuits. By doing so, it is demonstrated that the proposed approach is able to achieve higher likelihoods while using fewer parameters compared to the joint probability trees on seven benchmark data sets as well as on real robot data. Furthermore, we discuss how to integrate transformations into tree-based learning routines. Finally, we argue that exact inference with transformed quantile parameterized distributions is not tractable. However, our approach allows for efficient sampling and approximate inference.
ROJul 2, 2024
MARLIN: A Cloud Integrated Robotic Solution to Support Intralogistics in RetailDennis Mronga, Andreas Bresser, Fabian Maas et al.
In this paper, we present the service robot MARLIN and its integration with the K4R platform, a cloud system for complex AI applications in retail. At its core, this platform contains so-called semantic digital twins, a semantically annotated representation of the retail store. MARLIN continuously exchanges data with the K4R platform, improving the robot's capabilities in perception, autonomous navigation, and task planning. We exploit these capabilities in a retail intralogistics scenario, specifically by assisting store employees in stocking shelves. We demonstrate that MARLIN is able to update the digital representation of the retail store by detecting and classifying obstacles, autonomously planning and executing replenishment missions, adapting to unforeseen changes in the environment, and interacting with store employees. Experiments are conducted in simulation, in a laboratory environment, and in a real store. We also describe and evaluate a novel algorithm for autonomous navigation of articulated tractor-trailer systems. The algorithm outperforms the manufacturer's proprietary navigation approach and improves MARLIN's navigation capabilities in confined spaces.
ROSep 13, 2024
Shadow Program Inversion with Differentiable Planning: A Framework for Unified Robot Program Parameter and Trajectory OptimizationBenjamin Alt, Claudius Kienle, Darko Katic et al.
This paper presents SPI-DP, a novel first-order optimizer capable of optimizing robot programs with respect to both high-level task objectives and motion-level constraints. To that end, we introduce DGPMP2-ND, a differentiable collision-free motion planner for serial N-DoF kinematics, and integrate it into an iterative, gradient-based optimization approach for generic, parameterized robot program representations. SPI-DP allows first-order optimization of planned trajectories and program parameters with respect to objectives such as cycle time or smoothness subject to e.g. collision constraints, while enabling humans to understand, modify or even certify the optimized programs. We provide a comprehensive evaluation on two practical household and industrial applications.
ROOct 15, 2024Code
A Framework for Adapting Human-Robot Interaction to Diverse User GroupsTheresa Pekarek Rosin, Vanessa Hassouna, Xiaowen Sun et al.
To facilitate natural and intuitive interactions with diverse user groups in real-world settings, social robots must be capable of addressing the varying requirements and expectations of these groups while adapting their behavior based on user feedback. While previous research often focuses on specific demographics, we present a novel framework for adaptive Human-Robot Interaction (HRI) that tailors interactions to different user groups and enables individual users to modulate interactions through both minor and major interruptions. Our primary contributions include the development of an adaptive, ROS-based HRI framework with an open-source code base. This framework supports natural interactions through advanced speech recognition and voice activity detection, and leverages a large language model (LLM) as a dialogue bridge. We validate the efficiency of our framework through module tests and system trials, demonstrating its high accuracy in age recognition and its robustness to repeated user inputs and plan changes.
CVApr 17, 2025Code
Digital Twin Generation from Visual Data: A SurveyAndrew Melnik, Benjamin Alt, Giang Nguyen et al.
This survey explores recent developments in generating digital twins from videos. Such digital twins can be used for robotics application, media content creation, or design and construction works. We analyze various approaches, including 3D Gaussian Splatting, generative in-painting, semantic segmentation, and foundation models highlighting their advantages and limitations. Additionally, we discuss challenges such as occlusions, lighting variations, and scalability, as well as potential future research directions. This survey aims to provide a comprehensive overview of state-of-the-art methodologies and their implications for real-world applications. Awesome list: https://github.com/ndrwmlnk/awesome-digital-twins
AIJan 21
Implementing Knowledge Representation and Reasoning with Object Oriented DesignAbdelrhman Bassiouny, Tom Schierenbeck, Sorin Arion et al.
This paper introduces KRROOD, a framework designed to bridge the integration gap between modern software engineering and Knowledge Representation & Reasoning (KR&R) systems. While Object-Oriented Programming (OOP) is the standard for developing complex applications, existing KR&R frameworks often rely on external ontologies and specialized languages that are difficult to integrate with imperative code. KRROOD addresses this by treating knowledge as a first-class programming abstraction using native class structures, bridging the gap between the logic programming and OOP paradigms. We evaluate the system on the OWL2Bench benchmark and a human-robot task learning scenario. Experimental results show that KRROOD achieves strong performance while supporting the expressive reasoning required for real-world autonomous systems.
ROApr 30, 2024
Human-AI Interaction in Industrial Robotics: Design and Empirical Evaluation of a User Interface for Explainable AI-Based Robot Program OptimizationBenjamin Alt, Johannes Zahn, Claudius Kienle et al.
While recent advances in deep learning have demonstrated its transformative potential, its adoption for real-world manufacturing applications remains limited. We present an Explanation User Interface (XUI) for a state-of-the-art deep learning-based robot program optimizer which provides both naive and expert users with different user experiences depending on their skill level, as well as Explainable AI (XAI) features to facilitate the application of deep learning methods in real-world applications. To evaluate the impact of the XUI on task performance, user satisfaction and cognitive load, we present the results of a preliminary user survey and propose a study design for a large-scale follow-up study.
ROApr 21, 2024
BANSAI: Towards Bridging the AI Adoption Gap in Industrial Robotics with Neurosymbolic ProgrammingBenjamin Alt, Julia Dvorak, Darko Katic et al.
Over the past decade, deep learning helped solve manipulation problems across all domains of robotics. At the same time, industrial robots continue to be programmed overwhelmingly using traditional program representations and interfaces. This paper undertakes an analysis of this "AI adoption gap" from an industry practitioner's perspective. In response, we propose the BANSAI approach (Bridging the AI Adoption Gap via Neurosymbolic AI). It systematically leverages principles of neurosymbolic AI to establish data-driven, subsymbolic program synthesis and optimization in modern industrial robot programming workflow. BANSAI conceptually unites several lines of prior research and proposes a path toward practical, real-world validation.
ROFeb 26, 2024
RoboGrind: Intuitive and Interactive Surface Treatment with Industrial RobotsBenjamin Alt, Florian Stöckl, Silvan Müller et al.
Surface treatment tasks such as grinding, sanding or polishing are a vital step of the value chain in many industries, but are notoriously challenging to automate. We present RoboGrind, an integrated system for the intuitive, interactive automation of surface treatment tasks with industrial robots. It combines a sophisticated 3D perception pipeline for surface scanning and automatic defect identification, an interactive voice-controlled wizard system for the AI-assisted bootstrapping and parameterization of robot programs, and an automatic planning and execution pipeline for force-controlled robotic surface treatment. RoboGrind is evaluated both under laboratory and real-world conditions in the context of refabricating fiberglass wind turbine blades.
ROJul 26, 2025
A roadmap for AI in roboticsAude Billard, Alin Albu-Schaeffer, Michael Beetz et al.
AI technologies, including deep learning, large-language models have gone from one breakthrough to the other. As a result, we are witnessing growing excitement in robotics at the prospect of leveraging the potential of AI to tackle some of the outstanding barriers to the full deployment of robots in our daily lives. However, action and sensing in the physical world pose greater and different challenges than analysing data in isolation. As the development and application of AI in robotic products advances, it is important to reflect on which technologies, among the vast array of network architectures and learning models now available in the AI field, are most likely to be successfully applied to robots; how they can be adapted to specific robot designs, tasks, environments; which challenges must be overcome. This article offers an assessment of what AI for robotics has achieved since the 1990s and proposes a short- and medium-term research roadmap listing challenges and promises. These range from keeping up-to-date large datasets, representatives of a diversity of tasks robots may have to perform, and of environments they may encounter, to designing AI algorithms tailored specifically to robotics problems but generic enough to apply to a wide range of applications and transfer easily to a variety of robotic platforms. For robots to collaborate effectively with humans, they must predict human behavior without relying on bias-based profiling. Explainability and transparency in AI-driven robot control are not optional but essential for building trust, preventing misuse, and attributing responsibility in accidents. We close on what we view as the primary long-term challenges, that is, to design robots capable of lifelong learning, while guaranteeing safe deployment and usage, and sustainable computational costs.
ROAug 15, 2025
Open, Reproducible and Trustworthy Robot-Based Experiments with Virtual Labs and Digital-Twin-Based Execution TracingBenjamin Alt, Mareike Picklum, Sorin Arion et al.
We envision a future in which autonomous robots conduct scientific experiments in ways that are not only precise and repeatable, but also open, trustworthy, and transparent. To realize this vision, we present two key contributions: a semantic execution tracing framework that logs sensor data together with semantically annotated robot belief states, ensuring that automated experimentation is transparent and replicable; and the AICOR Virtual Research Building (VRB), a cloud-based platform for sharing, replicating, and validating robot task executions at scale. Together, these tools enable reproducible, robot-driven science by integrating deterministic execution, semantic memory, and open knowledge representation, laying the foundation for autonomous systems to participate in scientific discovery.
NCFeb 28, 2025
How Metacognitive Architectures Remember Their Own Thoughts: A Systematic ReviewRobin Nolte, Mihai Pomarlan, Ayden Janssen et al.
Background: Metacognition has gained significant attention for its potential to enhance autonomy and adaptability of artificial agents but remains a fragmented field: diverse theories, terminologies, and design choices have led to disjointed developments and limited comparability across systems. Existing overviews remain at a conceptual level that is undiscerning to the underlying algorithms, representations, and their respective success. Methods: We address this gap by performing an explorative systematic review. Reports were included if they described techniques enabling Computational Metacognitive Architectures (CMAs) to model, store, remember, and process their episodic metacognitive experiences, one of Flavell's (1979a) three foundational components of metacognition. Searches were conducted in 16 databases, consulted between December 2023 and June 2024. Data were extracted using a 20-item framework considering pertinent aspects. Results: A total of 101 reports on 35 distinct CMAs were included. Our findings show that metacognitive experiences may boost system performance and explainability, e.g., via self-repair. However, lack of standardization and limited evaluations may hinder progress: only 17% of CMAs were quantitatively evaluated regarding this review's focus, and significant terminological inconsistency limits cross-architecture synthesis. Systems also varied widely in memory content, data types, and employed algorithms. Discussion: Limitations include the non-iterative nature of the search query, heterogeneous data availability, and an under-representation of emergent, sub-symbolic CMAs. Future research should focus on standardization and evaluation, e.g., via community-driven challenges, and on transferring promising principles to emergent architectures.
CVJul 2, 2025
NOCTIS: Novel Object Cyclic Threshold based Instance SegmentationMax Gandyra, Alessandro Santonicola, Michael Beetz
Instance segmentation of novel objects instances in RGB images, given some example images for each object, is a well known problem in computer vision. Designing a model general enough to be employed for all kinds of novel objects without (re-) training has proven to be a difficult task. To handle this, we present a new training-free framework, called: Novel Object Cyclic Threshold based Instance Segmentation (NOCTIS). NOCTIS integrates two pre-trained models: Grounded-SAM 2 for object proposals with precise bounding boxes and corresponding segmentation masks; and DINOv2 for robust class and patch embeddings, due to its zero-shot capabilities. Internally, the proposal-object matching is realized by determining an object matching score based on the similarity of the class embeddings and the average maximum similarity of the patch embeddings with a new cyclic thresholding (CT) mechanism that mitigates unstable matches caused by repetitive textures or visually similar patterns. Beyond CT, NOCTIS introduces: (i) an appearance score that is unaffected by object selection bias; (ii) the usage of the average confidence of the proposals' bounding box and mask as a scoring component; and (iii) an RGB-only pipeline that performs even better than RGB-D ones. We empirically show that NOCTIS, without further training/fine tuning, outperforms the best RGB and RGB-D methods regarding the mean AP score on the seven core datasets of the BOP 2023 challenge for the "Model-based 2D segmentation of unseen objects" task.
ROJun 19, 2025
Grounding Language Models with Semantic Digital Twins for Robotic PlanningMehreen Naeem, Andrew Melnik, Michael Beetz
We introduce a novel framework that integrates Semantic Digital Twins (SDTs) with Large Language Models (LLMs) to enable adaptive and goal-driven robotic task execution in dynamic environments. The system decomposes natural language instructions into structured action triplets, which are grounded in contextual environmental data provided by the SDT. This semantic grounding allows the robot to interpret object affordances and interaction rules, enabling action planning and real-time adaptability. In case of execution failures, the LLM utilizes error feedback and SDT insights to generate recovery strategies and iteratively revise the action plan. We evaluate our approach using tasks from the ALFRED benchmark, demonstrating robust performance across various household scenarios. The proposed framework effectively combines high-level reasoning with semantic environment understanding, achieving reliable task completion in the face of uncertainty and failure.
ROMar 31, 2025
Towards a cognitive architecture to enable natural language interaction in co-constructive task learningManuel Scheibl, Birte Richter, Alissa Müller et al.
This research addresses the question, which characteristics a cognitive architecture must have to leverage the benefits of natural language in Co-Constructive Task Learning (CCTL). To provide context, we first discuss Interactive Task Learning (ITL), the mechanisms of the human memory system, and the significance of natural language and multi-modality. Next, we examine the current state of cognitive architectures, analyzing their capabilities to inform a concept of CCTL grounded in multiple sources. We then integrate insights from various research domains to develop a unified framework. Finally, we conclude by identifying the remaining challenges and requirements necessary to achieve CCTL in Human-Robot Interaction (HRI).
ROMay 24, 2023
From Interactive to Co-Constructive Task LearningAnna-Lisa Vollmer, Daniel Leidner, Michael Beetz et al.
Humans have developed the capability to teach relevant aspects of new or adapted tasks to a social peer with very few task demonstrations by making use of scaffolding strategies that leverage prior knowledge and importantly prior joint experience to yield a joint understanding and a joint execution of the required steps to solve the task. This process has been discovered and analyzed in parent-infant interaction and constitutes a ``co-construction'' as it allows both, the teacher and the learner, to jointly contribute to the task. We propose to focus research in robot interactive learning on this co-construction process to enable robots to learn from non-expert users in everyday situations. In the following, we will review current proposals for interactive task learning and discuss their main contributions with respect to the entailing interaction. We then discuss our notion of co-construction and summarize research insights from adult-child and human-robot interactions to elucidate its nature in more detail. From this overview we finally derive research desiderata that entail the dimensions architecture, representation, interaction and explainability.
ROJan 27, 2022
Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards Individualized and Explainable Robotic Support in Everyday ActivitiesAlexander Wich, Holger Schultheis, Michael Beetz
A key challenge for robotic systems is to figure out the behavior of another agent. The capability to draw correct inferences is crucial to derive human behavior from examples. Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally (observational evidence). For this reason, robots that rely on inferences that are correlational risk a biased interpretation of the evidence. We propose equipping robots with the necessary tools to conduct observational studies on people. Specifically, we propose and explore the feasibility of structural causal models with non-parametric estimators to derive empirical estimates on hand behavior in the context of object manipulation in a virtual kitchen scenario. In particular, we focus on inferences under (the weaker) conditions of partial confounding (the model covering only some factors) and confront estimators with hundreds of samples instead of the typical order of thousands. Studying these conditions explores the boundaries of the approach and its viability. Despite the challenging conditions, the estimates inferred from the validation data are correct. Moreover, these estimates are stable against three refutation strategies where four estimators are in agreement. Furthermore, the causal quantity for two individuals reveals the sensibility of the approach to detect positive and negative effects. The validity, stability and explainability of the approach are encouraging and serve as the foundation for further research.
ROMar 26, 2021
Robot Program Parameter Inference via Differentiable Shadow Program InversionBenjamin Alt, Darko Katic, Rainer Jäkel et al.
Challenging manipulation tasks can be solved effectively by combining individual robot skills, which must be parameterized for the concrete physical environment and task at hand. This is time-consuming and difficult for human programmers, particularly for force-controlled skills. To this end, we present Shadow Program Inversion (SPI), a novel approach to infer optimal skill parameters directly from data. SPI leverages unsupervised learning to train an auxiliary differentiable program representation ("shadow program") and realizes parameter inference via gradient-based model inversion. Our method enables the use of efficient first-order optimizers to infer optimal parameters for originally non-differentiable skills, including many skill variants currently used in production. SPI zero-shot generalizes across task objectives, meaning that shadow programs do not need to be retrained to infer parameters for different task variants. We evaluate our methods on three different robots and skill frameworks in industrial and household scenarios. Code and examples are available at https://innolab.artiminds.com/icra2021.
RODec 9, 2020
Kineverse: A Symbolic Articulation Model Framework for Model-Agnostic Mobile ManipulationAdrian Röfer, Georg Bartels, Wolfram Burgard et al.
Service robots in the future need to execute abstract instructions such as "fetch the milk from the fridge". To translate such instructions into actionable plans, robots require in-depth background knowledge. With regards to interactions with doors and drawers, robots require articulation models that they can use for state estimation and motion planning. Existing frameworks model articulated connections as abstract concepts such as prismatic, or revolute, but do not provide a parameterized model of these connections for computation. In this paper, we introduce a novel framework that uses symbolic mathematical expressions to model articulated structures -- robots and objects alike -- in a unified and extensible manner. We provide a theoretical description of this framework, and the operations that are supported by its models, and introduce an architecture to exchange our models in robotic applications, making them as flexible as any other environmental observation. To demonstrate the utility of our approach, we employ our practical implementation Kineverse for solving common robotics tasks from state estimation and mobile manipulation, and use it further in real-world mobile robot manipulation.
AIDec 8, 2020
URoboSim -- An Episodic Simulation Framework for Prospective Reasoning in Robotic AgentsMichael Neumann, Sebastian Koralewski, Michael Beetz
Anticipating what might happen as a result of an action is an essential ability humans have in order to perform tasks effectively. On the other hand, robots capabilities in this regard are quite lacking. While machine learning is used to increase the ability of prospection it is still limiting for novel situations. A possibility to improve the prospection ability of robots is through simulation of imagined motions and the physical results of these actions. Therefore, we present URoboSim, a robot simulator that allows robots to perform tasks as mental simulation before performing this task in reality. We show the capabilities of URoboSim in form of mental simulations, generating data for machine learning and the usage as belief state for a real robot.
AINov 27, 2020
Automated acquisition of structured, semantic models of manipulation activities from human VR demonstrationAndrei Haidu, Michael Beetz
In this paper we present a system capable of collecting and annotating, human performed, robot understandable, everyday activities from virtual environments. The human movements are mapped in the simulated world using off-the-shelf virtual reality devices with full body, and eye tracking capabilities. All the interactions in the virtual world are physically simulated, thus movements and their effects are closely relatable to the real world. During the activity execution, a subsymbolic data logger is recording the environment and the human gaze on a per-frame basis, enabling offline scene reproduction and replays. Coupled with the physics engine, online monitors (symbolic data loggers) are parsing (using various grammars) and recording events, actions, and their effects in the simulated world.
RONov 24, 2020
Foundations of the Socio-physical Model of Activities (SOMA) for Autonomous Robotic AgentsDaniel Beßler, Robert Porzel, Mihai Pomarlan et al.
In this paper, we present foundations of the Socio-physical Model of Activities (SOMA). SOMA represents both the physical as well as the social context of everyday activities. Such tasks seem to be trivial for humans, however, they pose severe problems for artificial agents. For starters, a natural language command requesting something will leave many pieces of information necessary for performing the task unspecified. Humans can solve such problems fast as we reduce the search space by recourse to prior knowledge such as a connected collection of plans that describe how certain goals can be achieved at various levels of abstraction. Rather than enumerating fine-grained physical contexts SOMA sets out to include socially constructed knowledge about the functions of actions to achieve a variety of goals or the roles objects can play in a given situation. As the human cognition system is capable of generalizing experiences into abstract knowledge pieces applicable to novel situations, we argue that both physical and social context need be modeled to tackle these challenges in a general manner. This is represented by the link between the physical and social context in SOMA where relationships are established between occurrences and generalizations of them, which has been demonstrated in several use cases that validate SOMA.
RONov 23, 2020
Imagination-enabled Robot PerceptionPatrick Mania, Franklin Kenghagho Kenfack, Michael Neumann et al.
Many of today's robot perception systems aim at accomplishing perception tasks that are too simplistic and too hard. They are too simplistic because they do not require the perception systems to provide all the information needed to accomplish manipulation tasks. Typically the perception results do not include information about the part structure of objects, articulation mechanisms and other attributes needed for adapting manipulation behavior. On the other hand, the perception problems stated are also too hard because -- unlike humans -- the perception systems cannot leverage the expectations about what they will see to their full potential. Therefore, we investigate a variation of robot perception tasks suitable for robots accomplishing everyday manipulation tasks, such as household robots or a robot in a retail store. In such settings it is reasonable to assume that robots know most objects and have detailed models of them. We propose a perception system that maintains its beliefs about its environment as a scene graph with physics simulation and visual rendering. When detecting objects, the perception system retrieves the model of the object and places it at the corresponding place in a VR-based environment model. The physics simulation ensures that object detections that are physically not possible are rejected and scenes can be rendered to generate expectations at the image level. The result is a perception system that can provide useful information for manipulation tasks.
RONov 19, 2020
The Robot Household Marathon ExperimentGayane Kazhoyan, Simon Stelter, Franklin Kenghagho Kenfack et al.
In this paper, we present an experiment, designed to investigate and evaluate the scalability and the robustness aspects of mobile manipulation. The experiment involves performing variations of mobile pick and place actions and opening/closing environment containers in a human household. The robot is expected to act completely autonomously for extended periods of time. We discuss the scientific challenges raised by the experiment as well as present our robotic system that can address these challenges and successfully perform all the tasks of the experiment. We present empirical results and the lessons learned as well as discuss where we hit limitations.
RODec 23, 2019
Manipulation Planning and Control for Shelf ReplenishmentMarco Costanzo, Simon Stelter, Ciro Natale et al.
Manipulation planning and control are relevant building blocks of a robotic system and their tight integration is a key factor to improve robot autonomy and allows robots to perform manipulation tasks of increasing complexity, such as those needed in the in-store logistics domain. Supermarkets contain a large variety of objects to be placed on the shelf layers with specific constraints, doing this with a robot is a challenge and requires a high dexterity. However, an integration of reactive grasping control and motion planning can allow robots to perform such tasks even with grippers with limited dexterity. The main contribution of the paper is a novel method for planning manipulation tasks to be executed using a reactive control layer that provides more control modalities, i.e., slipping avoidance and controlled sliding. Experiments with a new force/tactile sensor equipping the gripper of a mobile manipulator show that the approach allows the robot to successfully perform manipulation tasks unfeasible with a standard fixed grasp.
RONov 22, 2019
RoboSherlock: Cognition-enabled Robot Perception for Everyday Manipulation TasksFerenc Bálint-Benczédi, Jan-Hendrik Worch, Daniel Nyga et al.
A pressing question when designing intelligent autonomous systems is how to integrate the various subsystems concerned with complementary tasks. More specifically, robotic vision must provide task-relevant information about the environment and the objects in it to various planning related modules. In most implementations of the traditional Perception-Cognition-Action paradigm these tasks are treated as quasi-independent modules that function as black boxes for each other. It is our view that perception can benefit tremendously from a tight collaboration with cognition. We present RoboSherlock, a knowledge-enabled cognitive perception systems for mobile robots performing human-scale everyday manipulation tasks. In RoboSherlock, perception and interpretation of realistic scenes is formulated as an unstructured information management(UIM) problem. The application of the UIM principle supports the implementation of perception systems that can answer task-relevant queries about objects in a scene, boost object recognition performance by combining the strengths of multiple perception algorithms, support knowledge-enabled reasoning about objects and enable automatic and knowledge-driven generation of processing pipelines. We demonstrate the potential of the proposed framework through feasibility studies of systems for real-world scene perception that have been built on top of the framework.
ROMar 28, 2019
Amortized Object and Scene Perception for Long-term Robot ManipulationFerenc Balint-Benczedi, Michael Beetz
Mobile robots, performing long-term manipulation activities in human environments, have to perceive a wide variety of objects possessing very different visual characteristics and need to reliably keep track of these throughout the execution of a task. In order to be efficient, robot perception capabilities need to go beyond what is currently perceivable and should be able to answer queries about both current and past scenes. In this paper we investigate a perception system for long-term robot manipulation that keeps track of the changing environment and builds a representation of the perceived world. Specifically we introduce an amortized component that spreads perception tasks throughout the execution cycle. The resulting query driven perception system asynchronously integrates results from logged images into a symbolic and numeric (what we call sub-symbolic) representation that forms the perceptual belief state of the robot.
RODec 19, 2018
Towards Plan Transformations for Real-World Pick and Place TasksGayane Kazhoyan, Arthur Niedzwiecki, Michael Beetz
In this paper, we investigate the possibility of applying plan transformations to general manipulation plans in order to specialize them to the specific situation at hand. We present a framework for optimizing execution and achieving higher performance by autonomously transforming robot's behavior at runtime. We show that plans employed by robotic agents in real-world environments can be transformed, despite their control structures being very complex due to the specifics of acting in the real world. The evaluation is carried out on a plan of a PR2 robot performing pick and place tasks, to which we apply three example transformations, as well as on a large amount of experiments in a fast plan projection environment.
RODec 19, 2018
Specializing Underdetermined Action Descriptions Through Plan ProjectionGayane Kazhoyan, Michael Beetz
Plan execution on real robots in realistic environments is underdetermined and often leads to failures. The choice of action parameterization is crucial for task success. By thinking ahead of time with the fast plan projection mechanism proposed in this paper, a general plan can be specialized towards the environment and task at hand by choosing action parameterizations that are predicted to lead to successful execution. For finding causal relationships between action parameterizations and task success, we provide the robot with means for plan introspection and propose a systematic and hierarchical plan structure to support that. We evaluate our approach by showing how a PR2 robot, when equipped with the proposed system, is able to choose action parameterizations that increase task execution success rates and overall performance of fetch and deliver actions in a real world setting.
ROMar 7, 2018
Adapting Everyday Manipulation Skills to Varied ScenariosPawel Gajewski, Paulo Ferreira, Georg Bartels et al.
We address the problem of executing tool-using manipulation skills in scenarios where the objects to be used may vary. We assume that point clouds of the tool and target object can be obtained, but no interpretation or further knowledge about these objects is provided. The system must interpret the point clouds and decide how to use the tool to complete a manipulation task with a target object; this means it must adjust motion trajectories appropriately to complete the task. We tackle three everyday manipulations: scraping material from a tool into a container, cutting, and scooping from a container. Our solution encodes these manipulation skills in a generic way, with parameters that can be filled in at run-time via queries to a robot perception module; the perception module abstracts the functional parts for the tool and extracts key parameters that are needed for the task. The approach is evaluated in simulation and with selected examples on a PR2 robot.
ROMay 13, 2016
Knowledge-Enabled Robotic Agents for Shelf Replenishment in Cluttered Retail EnvironmentsJan Winkler, Ferenc Balint-Benczedi, Thiemo Wiedemeyer et al.
Autonomous robots in unstructured and dynamically changing retail environments have to master complex perception, knowledgeprocessing, and manipulation tasks. To enable them to act competently, we propose a framework based on three core components: (o) a knowledge-enabled perception system, capable of combining diverse information sources to cope with occlusions and stacked objects with a variety of textures and shapes, (o) knowledge processing methods produce strategies for tidying up supermarket racks, and (o) the necessary manipulation skills in confined spaces to arrange objects in semi-accessible rack shelves. We demonstrate our framework in an simulated environment as well as on a real shopping rack using a PR2 robot. Typical supermarket products are detected and rearranged in the retail rack, tidying up what was found to be misplaced items.
AIApr 21, 2015
Reasoning about Unmodelled Concepts - Incorporating Class Taxonomies in Probabilistic Relational ModelsDaniel Nyga, Michael Beetz
A key problem in the application of first-order probabilistic methods is the enormous size of graphical models they imply. The size results from the possible worlds that can be generated by a domain of objects and relations. One of the reasons for this explosion is that so far the approaches do not sufficiently exploit the structure and similarity of possible worlds in order to encode the models more compactly. We propose fuzzy inference in Markov logic networks, which enables the use of taxonomic knowledge as a source of imposing structure onto possible worlds. We show that by exploiting this structure, probability distributions can be represented more compactly and that the reasoning systems become capable of reasoning about concepts not contained in the probabilistic knowledge base.
ROJan 18, 2014
Learning and Reasoning with Action-Related Places for Robust Mobile ManipulationFreek Stulp, Andreas Fedrizzi, Lorenz Mösenlechner et al.
We propose the concept of Action-Related Place (ARPlace) as a powerful and flexible representation of task-related place in the context of mobile manipulation. ARPlace represents robot base locations not as a single position, but rather as a collection of positions, each with an associated probability that the manipulation action will succeed when located there. ARPlaces are generated using a predictive model that is acquired through experience-based learning, and take into account the uncertainty the robot has about its own location and the location of the object to be manipulated. When executing the task, rather than choosing one specific goal position based only on the initial knowledge about the task context, the robot instantiates an ARPlace, and bases its decisions on this ARPlace, which is updated as new information about the task becomes available. To show the advantages of this least-commitment approach, we present a transformational planner that reasons about ARPlaces in order to optimize symbolic plans. Our empirical evaluation demonstrates that using ARPlaces leads to more robust and efficient mobile manipulation in the face of state estimation uncertainty on our simulated robot.