Georg von Wichert

CV
7papers
55citations
Novelty41%
AI Score42

7 Papers

64.5ROMay 25
A Factory-Floor Deployment Case Study of VLA Pipelines for Industrial Packaging Task: Workflow, Failures, and Lessons

Brian Zhu, Philipp Schmitt, Philine Meister et al.

Vision-Language-Action (VLA) policies have shown promising manipulation capabilities, yet their practical impact is often limited by the reliability demands of real-world deployment. We present a deployment study of an industrial packaging task at Siemens Factory (GWE, Erlangen, Germany), where a robot must pick a transparent accessory bag from a cluttered pile, insert it into the remaining cavity of a cardboard package, and ensure that the bag and its contents remain below the closing plane. Our goal is to understand the practical effort required to adapt a pretrained Pi0.5 policy to a single factory-floor task through iterative fine-tuning and deployment-driven refinement. The pipeline consists of repeated loops of data collection, curation, fine-tuning, evaluation, and targeted recovery data collection. We have accumulated 2535 episodes (10 hours) from the on-site factory settings. In this paper, we contribute an empirical account of a factory-floor VLA deployment, highlighting recurring failure modes and lessons that inform how to improve the deployment workflow.

70.7ROMar 17
Enabling Dynamic Tracking in Vision-Language-Action Models via Time-Discrete and Time-Continuous Velocity Feedforward

Johannes Hechtl, Philipp Schmitt, Georg von Wichert et al.

While vision-language-action (VLA) models have shown great promise for robot manipulation, their deployment on rigid industrial robots remains challenging due to the inherent trade-off between compliance and responsiveness. Standard Behavior Cloning (BC) approaches predict discrete poses at low frequencies, omitting the velocity and acceleration feedforward terms typically used by low-level compliant controllers. This requires to rely on high stiffness for accurate tracking, thereby sacrificing safe contact dynamics. In this paper, we demonstrate the importance of integrating velocity feedforward terms into VLA policies to resolve this trade-off. We propose two methods for extracting velocity targets from VLAs: a time-discrete finite-difference approximation that serves as a highly effective bridge for existing models, and a continuous Cubic B-Spline action space that natively yields $C^2$ continuous trajectories for high-frequency control. Crucially, both approaches are strictly model-agnostic and compatible with any standard action-chunking architecture, requiring modifications only to teleoperation, data processing, and the low-level controller. We fine-tune the $π_{0.5}$ model and evaluate both of our approaches on a demanding, contact-rich cube-in-hole task. Our results indicate that incorporating the velocity feedforward term via finite differences significantly improves task execution speed, while the continuous B-Spline approach maintains high overall success rates and provides a foundation for smoother higher-order derivatives without compromising compliance.

CVFeb 21, 2020
Online Semantic Exploration of Indoor Maps

Ziyuan Liu, Dong Chen, Georg von Wichert

In this paper we propose a method to extract an abstracted floor plan from typical grid maps using Bayesian reasoning. The result of this procedure is a probabilistic generative model of the environment defined over abstract concepts. It is well suited for higher-level reasoning and communication purposes. We demonstrate the effectiveness of the approach through real-world experiments.

CVFeb 21, 2020
Applying Rule-Based Context Knowledge to Build Abstract Semantic Maps of Indoor Environments

Ziyuan Liu, Georg von Wichert

In this paper, we propose a generalizable method that systematically combines data driven MCMC samplingand inference using rule-based context knowledge for data abstraction. In particular, we demonstrate the usefulness of our method in the scenario of building abstract semantic maps for indoor environments. The product of our system is a parametric abstract model of the perceived environment that not only accurately represents the geometry of the environment but also provides valuable abstract information which benefits high-level robotic applications. Based on predefined abstract terms,such as type and relation, we define task-specific context knowledge as descriptive rules in Markov Logic Networks. The corresponding inference results are used to construct a priordistribution that aims to add reasonable constraints to the solution space of semantic maps. In addition, by applying a semantically annotated sensor model, we explicitly use context information to interpret the sensor data. Experiments on real world data show promising results and thus confirm the usefulness of our system.

CVFeb 19, 2020
Table-Top Scene Analysis Using Knowledge-Supervised MCMC

Ziyuan Liu, Dong Chen, Kai M. Wurm et al.

In this paper, we propose a probabilistic method to generate abstract scene graphs for table-top scenes from 6D object pose estimates. We explicitly make use of task-specfic context knowledge by encoding this knowledge as descriptive rules in Markov logic networks. Our approach to generate scene graphs is probabilistic: Uncertainty in the object poses is addressed by a probabilistic sensor model that is embedded in a data driven MCMC process. We apply Markov logic inference to reason about hidden objects and to detect false estimates of object poses. The effectiveness of our approach is demonstrated and evaluated in real world experiments.

CVFeb 19, 2020
A Generalizable Knowledge Framework for Semantic Indoor Mapping Based on Markov Logic Networks and Data Driven MCMC

Ziyuan Liu, Georg von Wichert

In this paper, we propose a generalizable knowledge framework for data abstraction, i.e. finding compact abstract model for input data using predefined abstract terms. Based on these abstract terms, intelligent autonomous systems, such as a robot, should be able to make inference according to specific knowledge base, so that they can better handle the complexity and uncertainty of the real world. We propose to realize this framework by combining Markov logic networks (MLNs) and data driven MCMC sampling, because the former are a powerful tool for modelling uncertain knowledge and the latter provides an efficient way to draw samples from unknown complex distributions. Furthermore, we show in detail how to adapt this framework to a certain task, in particular, semantic robot mapping. Based on MLNs, we formulate task-specific context knowledge as descriptive soft rules. Experiments on real world data and simulated data confirm the usefulness of our framework.

CVFeb 19, 2020
Extracting Semantic Indoor Maps from Occupancy Grids

Ziyuan Liu, Georg von Wichert

The primary challenge for any autonomous system operating in realistic, rather unconstrained scenarios is to manage the complexity and uncertainty of the real world. While it is unclear how exactly humans and other higher animals master these problems, it seems evident, that abstraction plays an important role. The use of abstract concepts allows to define the system behavior on higher levels. In this paper we focus on the semantic mapping of indoor environments. We propose a method to extract an abstracted floor plan from typical grid maps using Bayesian reasoning. The result of this procedure is a probabilistic generative model of the environment defined over abstract concepts. It is well suited for higher-level reasoning and communication purposes. We demonstrate the effectiveness of the approach using real-world data.