ROSep 30, 2022
NBV-SC: Next Best View Planning based on Shape Completion for Fruit Mapping and ReconstructionRohit Menon, Tobias Zaenker, Nils Dengler et al.
Active perception for fruit mapping and harvesting is a difficult task since occlusions occur frequently and the location as well as size of fruits change over time. State-of-the-art viewpoint planning approaches utilize computationally expensive ray casting operations to find good viewpoints aiming at maximizing information gain and covering the fruits in the scene. In this paper, we present a novel viewpoint planning approach that explicitly uses information about the predicted fruit shapes to compute targeted viewpoints that observe as yet unobserved parts of the fruits. Furthermore, we formulate the concept of viewpoint dissimilarity to reduce the sampling space for more efficient selection of useful, dissimilar viewpoints. Our simulation experiments with a UR5e arm equipped with an RGB-D sensor provide a quantitative demonstration of the efficacy of our iterative next best view planning method based on shape completion. In comparative experiments with a state-of-the-art viewpoint planner, we demonstrate improvement not only in the estimation of the fruit sizes, but also in their reconstruction, while significantly reducing the planning time. Finally, we show the viability of our approach for mapping sweet peppers plants with a real robotic system in a commercial glasshouse.
ROSep 27, 2023
Perception for Humanoid RobotsArindam Roychoudhury, Shahram Khorshidi, Subham Agrawal et al.
Purpose of Review: The field of humanoid robotics, perception plays a fundamental role in enabling robots to interact seamlessly with humans and their surroundings, leading to improved safety, efficiency, and user experience. This scientific study investigates various perception modalities and techniques employed in humanoid robots, including visual, auditory, and tactile sensing by exploring recent state-of-the-art approaches for perceiving and understanding the internal state, the environment, objects, and human activities. Recent Findings: Internal state estimation makes extensive use of Bayesian filtering methods and optimization techniques based on maximum a-posteriori formulation by utilizing proprioceptive sensing. In the area of external environment understanding, with an emphasis on robustness and adaptability to dynamic, unforeseen environmental changes, the new slew of research discussed in this study have focused largely on multi-sensor fusion and machine learning in contrast to the use of hand-crafted, rule-based systems. Human robot interaction methods have established the importance of contextual information representation and memory for understanding human intentions. Summary: This review summarizes the recent developments and trends in the field of perception in humanoid robots. Three main areas of application are identified, namely, internal state estimation, external environment estimation, and human robot interaction. The applications of diverse sensor modalities in each of these areas are considered and recent significant works are discussed.
ROMar 28, 2022
Learning Personalized Human-Aware Robot Navigation Using Virtual Reality Demonstrations from a User StudyJorge de Heuvel, Nathan Corral, Lilli Bruckschen et al.
For the most comfortable, human-aware robot navigation, subjective user preferences need to be taken into account. This paper presents a novel reinforcement learning framework to train a personalized navigation controller along with an intuitive virtual reality demonstration interface. The conducted user study provides evidence that our personalized approach significantly outperforms classical approaches with more comfortable human-robot experiences. We achieve these results using only a few demonstration trajectories from non-expert users, who predominantly appreciate the intuitive demonstration setup. As we show in the experiments, the learned controller generalizes well to states not covered in the demonstration data, while still reflecting user preferences during navigation. Finally, we transfer the navigation controller without loss in performance to a real robot.
ROOct 4, 2022
Handling Sparse Rewards in Reinforcement Learning Using Model Predictive ControlMurad Dawood, Nils Dengler, Jorge de Heuvel et al.
Reinforcement learning (RL) has recently proven great success in various domains. Yet, the design of the reward function requires detailed domain expertise and tedious fine-tuning to ensure that agents are able to learn the desired behaviour. Using a sparse reward conveniently mitigates these challenges. However, the sparse reward represents a challenge on its own, often resulting in unsuccessful training of the agent. In this paper, we therefore address the sparse reward problem in RL. Our goal is to find an effective alternative to reward shaping, without using costly human demonstrations, that would also be applicable to a wide range of domains. Hence, we propose to use model predictive control~(MPC) as an experience source for training RL agents in sparse reward environments. Without the need for reward shaping, we successfully apply our approach in the field of mobile robot navigation both in simulation and real-world experiments with a Kuboki Turtlebot 2. We furthermore demonstrate great improvement over pure RL algorithms in terms of success rate as well as number of collisions and timeouts. Our experiments show that MPC as an experience source improves the agent's learning process for a given task in the case of sparse rewards.
ROMar 26
RHINO-AR: An Augmented Reality Exhibit for Teaching Mobile Robotics Concepts in MuseumsNils Dengler, Tim Graf, Leif Van Holland et al.
We present RHINO-AR, an interactive Augmented Reality (AR) museum exhibit that reintroduces the historical mobile robot RHINO into its original exhibition environment at the Deutsches Museum Bonn. The system builds on our previous work RHINO-VR, which reconstructed the robot and the environment in virtual reality. Although this created an engaging experience, it also revealed an important limitation, because visitors were separated from the real exhibition space and from the physical robot on display. RHINO-AR addresses this reality gap by placing a virtual reconstruction of the robot directly into the real museum space. Implemented on a Magic Leap~2 headset using Unity, our system combines real-time environment meshing with interactive visualizations of LiDAR sensing, traversability, and path planning to make otherwise invisible robotics processes understandable to non-expert visitors. We evaluated RHINO-AR in a two-day museum study with 22 participants, assessing usability, technical performance, satisfaction, conceptual understanding, and preference comparison to RHINO-VR. The results show that RHINO-AR was well received, effectively conveyed key navigation concepts, and generally preferred over the VR exhibit due to its stronger physical grounding and increased realism.
ROMar 23
Efficient View Planning Guided by Previous-Session Reconstruction for Repeated Plant MonitoringSicong Pan, Luca Lobefaro, Moein Taherkhani et al.
Repeated plant monitoring is essential for tracking crop growth, and 3D reconstruction enables consistent comparison across monitoring sessions. However, rebuilding a 3D model from scratch in every session is costly and overlooks informative geometry already observed previously. We propose efficient view planning guided by a previous-session reconstruction, which reuses a 3D model from the previous session to improve active perception in the current session. Based on this previous-session reconstruction, our method replaces iterative next-best-view planning with one-shot view planning that selects an informative set of views and computes the globally shortest execution path connecting them. Experiments on real multi-session datasets, including public single-plant scans and a newly collected greenhouse crop-row dataset, show that our method achieves comparable or higher surface coverage with fewer executed views and shorter robot paths than iterative and one-shot baselines.
ROApr 7
Designing Privacy-Preserving Visual Perception for Robot Navigation Based on User Privacy PreferencesXuying Huang, Sicong Pan, Delphine Reinhardt et al.
Visual navigation is a fundamental capability of mobile service robots, yet the onboard cameras required for such navigation can capture privacy-sensitive information and raise user privacy concerns. Existing approaches to privacy-preserving navigation-oriented visual perception have largely been driven by technical considerations, with limited grounding in user privacy preferences. In this work, we propose a user-centered approach to designing privacy-preserving visual perception for robot navigation. To investigate how user privacy preferences can inform such design, we conducted two user studies. The results show that users prefer privacy-preserving visual abstractions and capture-time low-resolution preservation mechanisms: their preferred RGB resolution depends both on the desired privacy level and robot proximity during navigation. Based on these findings, we further derive a user-configurable distance-to-resolution privacy policy for privacy-preserving robot visual navigation.
ROMar 18
Interpreting Context-Aware Human Preferences for Multi-Objective Robot NavigationTharun Sethuraman, Subham Agrawal, Nils Dengler et al.
Robots operating in human-shared environments must not only achieve task-level navigation objectives such as safety and efficiency, but also adapt their behavior to human preferences. However, as human preferences are typically expressed in natural language and depend on environmental context, it is difficult to directly integrate them into low-level robot control policies. In this work, we present a pipeline that enables robots to understand and apply context-dependent navigation preferences by combining foundational models with a Multi-Objective Reinforcement Learning (MORL) navigation policy. Thus, our approach integrates high-level semantic reasoning with low-level motion control. A Vision-Language Model (VLM) extracts structured environmental context from onboard visual observations, while Large Language Models (LLM) convert natural language user feedback into interpretable, context-dependent behavioral rules stored in a persistent but updatable rule memory. A preference translation module then maps contextual information and stored rules into numerical preference vectors that parameterize a pretrained MORL policy for real-time navigation adaptation. We evaluate the proposed framework through quantitative component-level evaluations, a user study, and real-world robot deployments in various indoor environments. Our results demonstrate that the system reliably captures user intent, generates consistent preference vectors, and enables controllable behavior adaptation across diverse contexts. Overall, the proposed pipeline improves the adaptability, transparency, and usability of robots operating in shared human environments, while maintaining safe and responsive real-time control.
ROMay 11
ObjView-Bench: Rethinking Difficulty and Deployment for Object-Centric View PlanningSicong Pan, Hao Hu, Xuying Huang et al.
Object-centric view planning is a core component of active geometric 3D reconstruction in robotics, yet existing evaluations often conflate object complexity, planning difficulty, budget assumptions, and physical reachability constraints. As a result, conclusions drawn from idealized view-planning evaluations may not reliably predict performance under realistic reconstruction settings. We introduce ObjView-Bench, an evaluation framework for rethinking difficulty and deployment in object-centric view planning. First, we disentangle three quantities underlying view-planning evaluation: omnidirectional self-occlusion as an object-side attribute, observation saturation difficulty, and protocol-dependent planning difficulty defined through a set-cover formulation. This separation supports controlled dataset construction, analysis of slow-saturation objects, and a case study showing that planning difficulty-aware sampling can improve learned view planners. Second, we design deployment-oriented evaluation protocols that reveal how budget regimes and reachable-view constraints alter method behavior. Across classical, learned, and hybrid planners, ObjView-Bench shows that difficulty, budget, and reachability constraints substantially change method rankings and failure modes.
ROMar 25, 2024
Exploiting Priors from 3D Diffusion Models for RGB-Based One-Shot View PlanningSicong Pan, Liren Jin, Xuying Huang et al.
Object reconstruction is relevant for many autonomous robotic tasks that require interaction with the environment. A key challenge in such scenarios is planning view configurations to collect informative measurements for reconstructing an initially unknown object. One-shot view planning enables efficient data collection by predicting view configurations and planning the globally shortest path connecting all views at once. However, prior knowledge about the object is required to conduct one-shot view planning. In this work, we propose a novel one-shot view planning approach that utilizes the powerful 3D generation capabilities of diffusion models as priors. By incorporating such geometric priors into our pipeline, we achieve effective one-shot view planning starting with only a single RGB image of the object to be reconstructed. Our planning experiments in simulation and real-world setups indicate that our approach balances well between object reconstruction quality and movement cost.
ROApr 8, 2024
A Neuromorphic Approach to Obstacle Avoidance in Robot ManipulationAhmed Faisal Abdelrahman, Matias Valdenegro-Toro, Maren Bennewitz et al.
Neuromorphic computing mimics computational principles of the brain in $\textit{silico}$ and motivates research into event-based vision and spiking neural networks (SNNs). Event cameras (ECs) exclusively capture local intensity changes and offer superior power consumption, response latencies, and dynamic ranges. SNNs replicate biological neuronal dynamics and have demonstrated potential as alternatives to conventional artificial neural networks (ANNs), such as in reducing energy expenditure and inference time in visual classification. Nevertheless, these novel paradigms remain scarcely explored outside the domain of aerial robots. To investigate the utility of brain-inspired sensing and data processing, we developed a neuromorphic approach to obstacle avoidance on a camera-equipped manipulator. Our approach adapts high-level trajectory plans with reactive maneuvers by processing emulated event data in a convolutional SNN, decoding neural activations into avoidance motions, and adjusting plans using a dynamic motion primitive. We conducted experiments with a Kinova Gen3 arm performing simple reaching tasks that involve obstacles in sets of distinct task scenarios and in comparison to a non-adaptive baseline. Our neuromorphic approach facilitated reliable avoidance of imminent collisions in simulated and real-world experiments, where the baseline consistently failed. Trajectory adaptations had low impacts on safety and predictability criteria. Among the notable SNN properties were the correlation of computations with the magnitude of perceived motions and a robustness to different event emulation methods. Tests with a DAVIS346 EC showed similar performance, validating our experimental event emulation. Our results motivate incorporating SNN learning, utilizing neuromorphic processors, and further exploring the potential of neuromorphic methods.
ROFeb 28, 2025
Map Space Belief Prediction for Manipulation-Enhanced MappingJoao Marcos Correia Marques, Nils Dengler, Tobias Zaenker et al.
Searching for objects in cluttered environments requires selecting efficient viewpoints and manipulation actions to remove occlusions and reduce uncertainty in object locations, shapes, and categories. In this work, we address the problem of manipulation-enhanced semantic mapping, where a robot has to efficiently identify all objects in a cluttered shelf. Although Partially Observable Markov Decision Processes~(POMDPs) are standard for decision-making under uncertainty, representing unstructured interactive worlds remains challenging in this formalism. To tackle this, we define a POMDP whose belief is summarized by a metric-semantic grid map and propose a novel framework that uses neural networks to perform map-space belief updates to reason efficiently and simultaneously about object geometries, locations, categories, occlusions, and manipulation physics. Further, to enable accurate information gain analysis, the learned belief updates should maintain calibrated estimates of uncertainty. Therefore, we propose Calibrated Neural-Accelerated Belief Updates (CNABUs) to learn a belief propagation model that generalizes to novel scenarios and provides confidence-calibrated predictions for unknown areas. Our experiments show that our novel POMDP planner improves map completeness and accuracy over existing methods in challenging simulations and successfully transfers to real-world cluttered shelves in zero-shot fashion.
ROMar 30
Point of View: How Perspective Affects Perceived Robot SociabilitySubham Agrawal, Aftab Akthar, Nils Dengler et al.
Ensuring that robot navigation is safe and socially acceptable is crucial for comfortable human-robot interaction in shared environments. However, existing validation methods often rely on a bird's-eye (allocentric) perspective, which fails to capture the subjective first-person experience of pedestrians encountering robots in the real world. In this paper, we address the perceptual gap between allocentric validation and egocentric experience by investigating how different perspectives affect the perceived sociability and disturbance of robot trajectories. Our approach uses an immersive VR environment to evaluate identical robot trajectories across allocentric, egocentric-proximal, and egocentric-distal viewpoints in a user study. We perform this analysis for trajectories generated from two different navigation policies to understand if the observed differences are unique to a single type of trajectory or more generalizable. We further examine whether augmenting a trajectory with a head-nod gesture can bridge the perceptual gap and improve human comfort. Our experiments suggest that trajectories rated as sociable from an allocentric view may be perceived as significantly more disturbing when experienced from a first-person perspective in close proximity. Our results also demonstrate that while passing distance affects perceived disturbance, communicative social signaling, such as a head-nod, can effectively enhance the perceived sociability of the robot's behavior.
ROMar 7
Efficient Trajectory Optimization for Autonomous Racing via Formula-1 Data-Driven InitializationSamir Shehadeh, Lukas Kutsch, Nils Dengler et al.
Trajectory optimization is a central component of fast and efficient autonomous racing. However practical optimization pipelines remain highly sensitive to initialization and may converge slowly or to suboptimal local solutions when seeded with heuristic trajectories such as the centerline or minimum-curvature paths. To address this limitation, we leverage expert driving behavior as a initialization prior and propose a learning-informed initialization strategy based on real-world Formula 1 telemetry. To this end, we first construct a multi-track Formula~1 trajectory dataset by reconstructing and aligning noisy GPS telemetry to a standardized reference-line representation across 17 tracks. Building on this, we present a neural network that predicts an expert-like raceline offset directly from local track geometry, without explicitly modeling vehicle dynamics or forces. The predicted raceline is then used as an informed seed for a minimum-time optimal control solver. Experiments on all 17 tracks demonstrate that the learned initialization accelerates solver convergence and significantly reduces runtime compared to traditional geometric baselines, while preserving the final optimized lap time.
ROMar 6
SG-DOR: Learning Scene Graphs with Direction-Conditioned Occlusion Reasoning for Pepper PlantsRohit Menon, Niklas Mueller-Goldingen, Sicong Pan et al.
Robotic harvesting in dense crop canopies requires effective interventions that depend not only on geometry, but also on explicit, direction-conditioned relations identifying which organs obstruct a target fruit. We present SG-DOR (Scene Graphs with Direction-Conditioned Occlusion Reasoning), a relational framework that, given instance-segmented organ point clouds, infers a scene graph encoding physical attachments and direction-conditioned occlusion. We introduce an occlusion ranking task for retrieving and ranking candidate leaves for a target fruit and approach direction, and propose a direction-aware graph neural architecture with per-fruit leaf-set attention and union-level aggregation. Experiments on a multi-plant synthetic pepper dataset show improved occlusion prediction (F1=0.73, NDCG@3=0.85) and attachment inference (edge F1=0.83) over strong ablations, yielding a structured relational signal for downstream intervention planning.
ROOct 13, 2025
Constraint-Aware Reinforcement Learning via Adaptive Action ScalingMurad Dawood, Usama Ahmed Siddiquie, Shahram Khorshidi et al.
Safe reinforcement learning (RL) seeks to mitigate unsafe behaviors that arise from exploration during training by reducing constraint violations while maintaining task performance. Existing approaches typically rely on a single policy to jointly optimize reward and safety, which can cause instability due to conflicting objectives, or they use external safety filters that override actions and require prior system knowledge. In this paper, we propose a modular cost-aware regulator that scales the agent's actions based on predicted constraint violations, preserving exploration through smooth action modulation rather than overriding the policy. The regulator is trained to minimize constraint violations while avoiding degenerate suppression of actions. Our approach integrates seamlessly with off-policy RL methods such as SAC and TD3, and achieves state-of-the-art return-to-cost ratios on Safety Gym locomotion tasks with sparse costs, reducing constraint violations by up to 126 times while increasing returns by over an order of magnitude compared to prior methods.
ROJul 21, 2025
Improved Semantic Segmentation from Ultra-Low-Resolution RGB Images Applied to Privacy-Preserving Object-Goal NavigationXuying Huang, Sicong Pan, Olga Zatsarynna et al.
User privacy in mobile robotics has become a critical concern. Existing methods typically prioritize either the performance of downstream robotic tasks or privacy protection, with the latter often constraining the effectiveness of task execution. To jointly address both objectives, we study semantic-based robot navigation in an ultra-low-resolution setting to preserve visual privacy. A key challenge in such scenarios is recovering semantic segmentation from ultra-low-resolution RGB images. In this work, we introduce a novel fully joint-learning method that integrates an agglomerative feature extractor and a segmentation-aware discriminator to solve ultra-low-resolution semantic segmentation, thereby enabling privacy-preserving, semantic object-goal navigation. Our method outperforms different baselines on ultra-low-resolution semantic segmentation and our improved segmentation results increase the success rate of the semantic object-goal navigation in a real-world privacy-constrained scenario.
ROMay 12, 2025
Privacy Risks of Robot Vision: A User Study on Image Modalities and ResolutionXuying Huang, Sicong Pan, Maren Bennewitz
User privacy is a crucial concern in robotic applications, especially when mobile service robots are deployed in personal or sensitive environments. However, many robotic downstream tasks require the use of cameras, which may raise privacy risks. To better understand user perceptions of privacy in relation to visual data, we conducted a user study investigating how different image modalities and image resolutions affect users' privacy concerns. The results show that depth images are broadly viewed as privacy-safe, and a similarly high proportion of respondents feel the same about semantic segmentation images. Additionally, the majority of participants consider 32*32 resolution RGB images to be almost sufficiently privacy-preserving, while most believe that 16*16 resolution can fully guarantee privacy protection.
ROMay 8, 2025
Multi-Objective Reinforcement Learning for Adaptable Personalized Autonomous DrivingHendrik Surmann, Jorge de Heuvel, Maren Bennewitz
Human drivers exhibit individual preferences regarding driving style. Adapting autonomous vehicles to these preferences is essential for user trust and satisfaction. However, existing end-to-end driving approaches often rely on predefined driving styles or require continuous user feedback for adaptation, limiting their ability to support dynamic, context-dependent preferences. We propose a novel approach using multi-objective reinforcement learning (MORL) with preference-driven optimization for end-to-end autonomous driving that enables runtime adaptation to driving style preferences. Preferences are encoded as continuous weight vectors to modulate behavior along interpretable style objectives$\unicode{x2013}$including efficiency, comfort, speed, and aggressiveness$\unicode{x2013}$without requiring policy retraining. Our single-policy agent integrates vision-based perception in complex mixed-traffic scenarios and is evaluated in diverse urban environments using the CARLA simulator. Experimental results demonstrate that the agent dynamically adapts its driving behavior according to changing preferences while maintaining performance in terms of collision avoidance and route completion.
ROApr 16, 2025
DM-OSVP++: One-Shot View Planning Using 3D Diffusion Models for Active RGB-Based Object ReconstructionSicong Pan, Liren Jin, Xuying Huang et al.
Active object reconstruction is crucial for many robotic applications. A key aspect in these scenarios is generating object-specific view configurations to obtain informative measurements for reconstruction. One-shot view planning enables efficient data collection by predicting all views at once, eliminating the need for time-consuming online replanning. Our primary insight is to leverage the generative power of 3D diffusion models as valuable prior information. By conditioning on initial multi-view images, we exploit the priors from the 3D diffusion model to generate an approximate object model, serving as the foundation for our view planning. Our novel approach integrates the geometric and textural distributions of the object model into the view planning process, generating views that focus on the complex parts of the object to be reconstructed. We validate the proposed active object reconstruction system through both simulation and real-world experiments, demonstrating the effectiveness of using 3D diffusion priors for one-shot view planning.
ROMar 6, 2025
EvidMTL: Evidential Multi-Task Learning for Uncertainty-Aware Semantic Surface Mapping from Monocular RGB ImagesRohit Menon, Nils Dengler, Sicong Pan et al.
For scene understanding in unstructured environments, an accurate and uncertainty-aware metric-semantic mapping is required to enable informed action selection by autonomous systems. Existing mapping methods often suffer from overconfident semantic predictions, and sparse and noisy depth sensing, leading to inconsistent map representations. In this paper, we therefore introduce EvidMTL, a multi-task learning framework that uses evidential heads for depth estimation and semantic segmentation, enabling uncertainty-aware inference from monocular RGB images. To enable uncertainty-calibrated evidential multi-task learning, we propose a novel evidential depth loss function that jointly optimizes the belief strength of the depth prediction in conjunction with evidential segmentation loss. Building on this, we present EvidKimera, an uncertainty-aware semantic surface mapping framework, which uses evidential depth and semantics prediction for improved 3D metric-semantic consistency. We train and evaluate EvidMTL on the NYUDepthV2 and assess its zero-shot performance on ScanNetV2, demonstrating superior uncertainty estimation compared to conventional approaches while maintaining comparable depth estimation and semantic segmentation. In zero-shot mapping tests on ScanNetV2, EvidKimera outperforms Kimera in semantic surface mapping accuracy and consistency, highlighting the benefits of uncertainty-aware mapping and underscoring its potential for real-world robotic applications.
ROFeb 25, 2022
On the Use of Torque Measurement in Centroidal State EstimationShahram Khorshidi, Ahmad Gazar, Nicholas Rotella et al.
State of the art legged robots are either capable of measuring torque at the output of their drive systems, or have transparent drive systems which enable the computation of joint torques from motor currents. In either case, this sensor modality is seldom used in state estimation. In this paper, we propose to use joint torque measurements to estimate the centroidal states of legged robots. To do so, we project the whole-body dynamics of a legged robot into the nullspace of the contact constraints, allowing expression of the dynamics independent of the contact forces. Using the constrained dynamics and the centroidal momentum matrix, we are able to directly relate joint torques and centroidal states dynamics. Using the resulting model as the process model of an Extended Kalman Filter (EKF), we fuse the torque measurement in the centroidal state estimation problem. Through real-world experiments on a quadruped robot with different gaits, we demonstrate that the estimated centroidal states from our torque-based EKF drastically improve the recovery of these quantities compared to direct computation.
ROSep 16, 2021
Fast-Replanning Motion Control for Non-Holonomic Vehicles with Aborting A*Marcell Missura, Arindam Roychoudhury, Maren Bennewitz
Autonomously driving vehicles must be able to navigate in dynamic and unpredictable environments in a collision-free manner. So far, this has only been partially achieved in driverless cars and warehouse installations where marked structures such as roads, lanes, and traffic signs simplify the motion planning and collision avoidance problem. We are presenting a new control approach for car-like vehicles that is based on an unprecedentedly fast-paced A* implementation that allows the control cycle to run at a frequency of 30 Hz. This frequency enables us to place our A* algorithm as a low-level replanning controller that is well suited for navigation and collision avoidance in virtually any dynamic environment. Due to an efficient heuristic consisting of rotate-translate-rotate motions laid out along the shortest path to the target, our Short-Term Aborting A* (STAA*) converges fast and can be aborted early in order to guarantee a high and steady control rate. While our STAA* expands states along the shortest path, it takes care of collision checking with the environment including predicted states of moving obstacles, and returns the best solution found when the computation time runs out. Despite the bounded computation time, our STAA* does not get trapped in corners due to the following of the shortest path. In simulated and real-robot experiments, we demonstrate that our control approach eliminates collisions almost entirely and is superior to an improved version of the Dynamic Window Approach with predictive collision avoidance capabilities.
ROAug 30, 2021
Sensor-Based Navigation Using Hierarchical Reinforcement LearningChristopher Gebauer, Nils Dengler, Maren Bennewitz
Robotic systems are nowadays capable of solving complex navigation tasks. However, their capabilities are limited to the knowledge of the designer and consequently lack generalizability to initially unconsidered situations. This makes deep reinforcement learning (DRL) especially interesting, as these algorithms promise a self-learning system only relying on feedback from the environment. In this paper, we consider the problem of lidar-based robot navigation in continuous action space using DRL without providing any goal-oriented or global information. By relying solely on local sensor data to solve navigation tasks, we design an agent that assigns its own waypoints based on intrinsic motivation. Our agent is able to learn goal-directed navigation behavior even when facing only sparse feedback, i.e., delayed rewards when reaching the target. To address this challenge and the complexity of the continuous action space, we deploy a hierarchical agent structure in which the exploration is distributed across multiple layers. Within the hierarchical structure, our agent self-assigns internal goals and learns to extract reasonable waypoints to reach the desired target position only based on local sensor data. In our experiments, we demonstrate the navigation capabilities of our agent in two environments and show that the hierarchical structure seriously improves the performance in terms of success rate and success weighted by path length in comparison to a flat structure. Furthermore, we provide a real-robot experiment to illustrate that the trained agent can be easily transferred to a real-world scenario.
ROAug 18, 2021
Combining Local and Global Viewpoint Planning for Fruit CoverageTobias Zaenker, Chris Lehnert, Chris McCool et al.
Obtaining 3D sensor data of complete plants or plant parts (e.g., the crop or fruit) is difficult due to their complex structure and a high degree of occlusion. However, especially for the estimation of the position and size of fruits, it is necessary to avoid occlusions as much as possible and acquire sensor information of the relevant parts. Global viewpoint planners exist that suggest a series of viewpoints to cover the regions of interest up to a certain degree, but they usually prioritize global coverage and do not emphasize the avoidance of local occlusions. On the other hand, there are approaches that aim at avoiding local occlusions, but they cannot be used in larger environments since they only reach a local maximum of coverage. In this paper, we therefore propose to combine a local, gradient-based method with global viewpoint planning to enable local occlusion avoidance while still being able to cover large areas. Our simulated experiments with a robotic arm equipped with a camera array as well as an RGB-D camera show that this combination leads to a significantly increased coverage of the regions of interest compared to just applying global coverage planning.
ROFeb 3, 2021
The Pitfall of More Powerful Autoencoders in Lidar-Based NavigationChristopher Gebauer, Maren Bennewitz
The benefit of pretrained autoencoders for reinforcement learning in comparison to training on raw observations is already known [1]. In this paper, we address the generation of a compact and information-rich state representation. In particular, we train a variational autoencoder for 2D-lidar scans to use its latent state for reinforcement learning of navigation tasks. To achieve high reconstruction power of our autoencoding pipeline, we propose an - in the context of autoencoding 2D-lidar scans - novel preprocessing into a local binary occupancy image. This has no additional requirements, neither self-localization nor robust mapping, and therefore can be applied in any setting and easily transferred from simulation in real-world. In a second stage, we show the usage of the compact state representation generated by our autoencoding pipeline in a simplistic navigation task and expose the pitfall that increased reconstruction power will always lead to an improved performance. We implemented our approach in python using tensorflow. Our datasets are simulated with pybullet as well as recorded using a slamtec rplidar A3. The experiments show the significantly improved reconstruction capabilities of our approach for 2D-lidar scans w.r.t. the state of the art. However, as we demonstrate in the experiments the impact on reinforcement learning in lidar-based navigation tasks is non-predictable when improving the latent state representation generated by an autoencoding pipeline. This is surprising and needs to be taken into account during the process of optimizing a pretrained autoencoder for reinforcement learning tasks.
RONov 13, 2020
Online Object-Oriented Semantic Mapping and Map UpdatingNils Dengler, Tobias Zaenker, Francesco Verdoja et al.
Creating and maintaining an accurate representation of the environment is an essential capability for every service robot. Especially for household robots acting in indoor environments, semantic information is important. In this paper, we present a semantic mapping framework with modular map representations. Our system is capable of online mapping and object updating given object detections from RGB-D data and provides various 2D and 3D~representations of the mapped objects. To undo wrong data associations, we perform a refinement step when updating object shapes. Furthermore, we maintain an existence likelihood for each object to deal with false positive and false negative detections and keep the map updated. Our mapping system is highly efficient and achieves a run time of more than 10 Hz. We evaluated our approach in various environments using two different robots, i.e., a Toyota HSR and a Fraunhofer Care-O-Bot-4. As the experimental results demonstrate, our system is able to generate maps that are close to the ground truth and outperforms an existing approach in terms of intersection over union, different distance metrics, and the number of correct object mappings
RONov 5, 2020
Capture Steps: Robust Walking for Humanoid RobotsMarcell Missura, Maren Bennewitz, Sven Behnke
Stable bipedal walking is a key prerequisite for humanoid robots to reach their potential of being versatile helpers in our everyday environments. Bipedal walking is, however, a complex motion that requires the coordination of many degrees of freedom while it is also inherently unstable and sensitive to disturbances. The balance of a walking biped has to be constantly maintained. The most effective way of controlling balance are well timed and placed recovery steps -- capture steps -- that absorb the expense momentum gained from a push or a stumble. We present a bipedal gait generation framework that utilizes step timing and foot placement techniques in order to recover the balance of a biped even after strong disturbances. Our framework modifies the next footstep location instantly when responding to a disturbance and generates controllable omnidirectional walking using only very little sensing and computational power. We exploit the open-loop stability of a central pattern generated gait to fit a linear inverted pendulum model to the observed center of mass trajectory. Then, we use the fitted model to predict suitable footstep locations and timings in order to maintain balance while following a target walking velocity. Our experiments show qualitative and statistical evidence of one of the strongest push-recovery capabilities among humanoid robots to date.
ROOct 31, 2020
Viewpoint Planning for Fruit Size and Position EstimationTobias Zaenker, Claus Smitt, Chris McCool et al.
Modern agricultural applications require knowledge about the position and size of fruits on plants. However, occlusions from leaves typically make obtaining this information difficult. We present a novel viewpoint planning approach that builds up an octree of plants with labeled regions of interest (ROIs), i.e., fruits. Our method uses this octree to sample viewpoint candidates that increase the information around the fruit regions and evaluates them using a heuristic utility function that takes into account the expected information gain. Our system automatically switches between ROI targeted sampling and exploration sampling, which considers general frontier voxels, depending on the estimated utility. When the plants have been sufficiently covered with the RGB-D sensor, our system clusters the ROI voxels and estimates the position and size of the detected fruits. We evaluated our approach in simulated scenarios and compared the resulting fruit estimations with the ground truth. The results demonstrate that our combined approach outperforms a sampling method that does not explicitly consider the ROIs to generate viewpoints in terms of the number of discovered ROI cells. Furthermore, we show the real-world applicability by testing our framework on a robotic arm equipped with an RGB-D camera installed on an automated pipe-rail trolley in a capsicum glasshouse.
ROOct 30, 2020
PATHoBot: A Robot for Glasshouse Crop Phenotyping and InterventionClaus Smitt, Michael Halstead, Tobias Zaenker et al.
We present PATHoBot an autonomous crop surveying and intervention robot for glasshouse environments. The aim of this platform is to autonomously gather high quality data and also estimate key phenotypic parameters. To achieve this we retro-fit an off-the-shelf pipe-rail trolley with an array of multi-modal cameras, navigation sensors and a robotic arm for close surveying tasks and intervention. In this paper we describe PATHoBot design choices made to ensure proper operation in a commercial glasshouse environment. As a surveying platform we collect a number of datasets which include both sweet pepper and tomatoes. We show how PATHoBot enables novel surveillance approaches by first improving our previous work on fruit counting by incorporating wheel odometry and depth information. We find that by introducing re-projection and depth information we are able to achieve an absolute improvement of 20 points over the baseline technique in an "in the wild" situation. Finally, we present a 3D mapping case study, further showcasing PATHoBot's crop surveying capabilities.
HCJul 9, 2020
A Neuro-inspired Theory of Joint Human-Swarm InteractionJonas D. Hasbach, Maren Bennewitz
Human-swarm interaction (HSI) is an active research challenge in the realms of swarm robotics and human-factors engineering. Here we apply a cognitive systems engineering perspective and introduce a neuro-inspired joint systems theory of HSI. The mindset defines predictions for adaptive, robust and scalable HSI dynamics and therefore has the potential to inform human-swarm loop design.
CVDec 13, 2019
Bonn Activity Maps: Dataset DescriptionJulian Tanke, Oh-Hun Kwon, Patrick Stotko et al.
The key prerequisite for accessing the huge potential of current machine learning techniques is the availability of large databases that capture the complex relations of interest. Previous datasets are focused on either 3D scene representations with semantic information, tracking of multiple persons and recognition of their actions, or activity recognition of a single person in captured 3D environments. We present Bonn Activity Maps, a large-scale dataset for human tracking, activity recognition and anticipation of multiple persons. Our dataset comprises four different scenes that have been recorded by time-synchronized cameras each only capturing the scene partially, the reconstructed 3D models with semantic annotations, motion trajectories for individual people including 3D human poses as well as human activity annotations. We utilize the annotations to generate activity likelihoods on the 3D models called activity maps.
ROJun 15, 2013
Proceedings of the 2nd Workshop on Robots in Clutter: Preparing robots for the real world (Berlin, 2013)Michael Zillich, Maren Bennewitz, Maria Fox et al.
This volume represents the proceedings of the 2nd Workshop on Robots in Clutter: Preparing robots for the real world, held June 27, 2013, at the Robotics: Science and Systems conference in Berlin, Germany.