ROJul 31, 2023
End-to-End Reinforcement Learning for Torque Based Variable Height HoppingRaghav Soni, Daniel Harnack, Hannah Isermann et al.
Legged locomotion is arguably the most suited and versatile mode to deal with natural or unstructured terrains. Intensive research into dynamic walking and running controllers has recently yielded great advances, both in the optimal control and reinforcement learning (RL) literature. Hopping is a challenging dynamic task involving a flight phase and has the potential to increase the traversability of legged robots. Model based control for hopping typically relies on accurate detection of different jump phases, such as lift-off or touch down, and using different controllers for each phase. In this paper, we present a end-to-end RL based torque controller that learns to implicitly detect the relevant jump phases, removing the need to provide manual heuristics for state detection. We also extend a method for simulation to reality transfer of the learned controller to contact rich dynamic tasks, resulting in successful deployment on the robot after training without parameter tuning.
62.8HCMar 16
DR. INFO at the Point of Care: A Prospective Pilot Study of an Agentic AI Clinical AssistantRogerio Corga Da Silva, Miguel Romano, Tiago Mendes et al.
Background: Clinical documentation and information retrieval consume over half of physicians working hours, contributing to cognitive overload and burnout. While artificial intelligence offers a potential solution, concerns over hallucinations and source reliability have limited adoption at the point of care. Objective: To evaluate clinician-reported time savings, decision-making support, and satisfaction with DR. INFO, an agentic AI clinical assistant, in routine clinical practice. Methods: In this prospective, single-arm pilot study, 29 clinicians across multiple specialties in Portuguese healthcare institutions used DR. INFO v1.0 over five working days within a two-week period. Outcomes were assessed via daily Likert-scale evaluations and a final Net Promoter Score. Non-parametric methods were used throughout. Results: Clinicians reported high perceived time saving (mean 4.27/5; 95% CI: 3.97-4.57) and decision support (4.16/5; 95% CI: 3.86-4.45), with ratings stable across all study days and no evidence of attrition bias. The Net Promoter Score was 81.2, with no detractors. Conclusions: Clinicians across specialties and career stages reported sustained satisfaction with DR. INFO for both time efficiency and clinical decision support. Validation in larger, controlled studies with objective outcome measures is warranted.
RODec 16, 2023
Deriving Rewards for Reinforcement Learning from Symbolic Behaviour Descriptions of Bipedal WalkingDaniel Harnack, Christoph Lüth, Lukas Gross et al.
Generating physical movement behaviours from their symbolic description is a long-standing challenge in artificial intelligence (AI) and robotics, requiring insights into numerical optimization methods as well as into formalizations from symbolic AI and reasoning. In this paper, a novel approach to finding a reward function from a symbolic description is proposed. The intended system behaviour is modelled as a hybrid automaton, which reduces the system state space to allow more efficient reinforcement learning. The approach is applied to bipedal walking, by modelling the walking robot as a hybrid automaton over state space orthants, and used with the compass walker to derive a reward that incentivizes following the hybrid automaton cycle. As a result, training times of reinforcement learning controllers are reduced while final walking speed is increased. The approach can serve as a blueprint how to generate reward functions from symbolic AI and reasoning.
QMAug 29, 2025
OpenAIs HealthBench in Action: Evaluating an LLM-Based Medical Assistant on Realistic Clinical QueriesSandhanakrishnan Ravichandran, Shivesh Kumar, Rogerio Corga Da Silva et al.
Evaluating large language models (LLMs) on their ability to generate high-quality, accurate, situationally aware answers to clinical questions requires going beyond conventional benchmarks to assess how these systems behave in complex, high-stake clincal scenarios. Traditional evaluations are often limited to multiple-choice questions that fail to capture essential competencies such as contextual reasoning, awareness and uncertainty handling etc. To address these limitations, we evaluate our agentic, RAG-based clinical support assistant, DR.INFO, using HealthBench, a rubric-driven benchmark composed of open-ended, expert-annotated health conversations. On the Hard subset of 1,000 challenging examples, DR.INFO achieves a HealthBench score of 0.51, substantially outperforming leading frontier LLMs (GPT-5, o3, Grok 3, GPT-4, Gemini 2.5, etc.) across all behavioral axes (accuracy, completeness, instruction following, etc.). In a separate 100-sample evaluation against similar agentic RAG assistants (OpenEvidence, Pathway.md), it maintains a performance lead with a health-bench score of 0.54. These results highlight DR.INFOs strengths in communication, instruction following, and accuracy, while also revealing areas for improvement in context awareness and completeness of a response. Overall, the findings underscore the utility of behavior-level, rubric-based evaluation for building a reliable and trustworthy AI-enabled clinical support assistant.
ROFeb 24, 2022
An efficient combined local and global search strategy for optimization of parallel kinematic mechanisms with joint limits and collision constraintsHaribhau Durgesh, Guillaume Michel, Shivesh Kumar et al.
The optimization of parallel kinematic manipulators (PKM) involve several constraints that are difficult to formalize, thus making optimal synthesis problem highly challenging. The presence of passive joint limits as well as the singularities and self-collisions lead to a complicated relation between the input and output parameters. In this article, a novel optimization methodology is proposed by combining a local search, Nelder-Mead algorithm, with global search methodologies such as low discrepancy distribution for faster and more efficient exploration of the optimization space. The effect of the dimension of the optimization problem and the different constraints are discussed to highlight the complexities of closed-loop kinematic chain optimization. The work also presents the approaches used to consider constraints for passive joint boundaries as well as singularities to avoid internal collisions in such mechanisms. The proposed algorithm can also optimize the length of the prismatic actuators and the constraints can be added in modular fashion, allowing to understand the impact of given criteria on the final result. The application of the presented approach is used to optimize two PKMs of different degrees of freedom.
ROMar 10, 2021
Nth Order Analytical Time Derivatives of Inverse Dynamics in Recursive and Closed FormsShivesh Kumar, Andreas Mueller
Derivatives of equations of motion describing the rigid body dynamics are becoming increasingly relevant for the robotics community and find many applications in design and control of robotic systems. Controlling robots, and multibody systems comprising elastic components in particular, not only requires smooth trajectories but also the time derivatives of the control forces/torques, hence of the equations of motion (EOM). This paper presents novel nth order time derivatives of the EOM in both closed and recursive forms. While the former provides a direct insight into the structure of these derivatives,the latter leads to their highly efficient implementation for large degree of freedom robotic system.
ROJan 26, 2021
Design, analysis and control of the series-parallel hybrid RH5 humanoid robotJulian Esser, Shivesh Kumar, Heiner Peters et al.
Last decades of humanoid research has shown that humanoids developed for high dynamic performance require a stiff structure and optimal distribution of mass--inertial properties. Humanoid robots built with a purely tree type architecture tend to be bulky and usually suffer from velocity and force/torque limitations. This paper presents a novel series-parallel hybrid humanoid called RH5 which is 2 m tall and weighs only 62.5 kg capable of performing heavy-duty dynamic tasks with 5 kg payloads in each hand. The analysis and control of this humanoid is performed with whole-body trajectory optimization technique based on differential dynamic programming (DDP). Additionally, we present an improved contact stability soft-constrained DDP algorithm which is able to generate physically consistent walking trajectories for the humanoid that can be tracked via a simple PD position control in a physics simulator. Finally, we showcase preliminary experimental results on the RH5 humanoid robot.
ROJul 29, 2020
A Development Cycle for Automated Self-Exploration of Robot BehaviorsThomas M. Roehr, Daniel Harnack, Hendrik Wöhrle et al.
In this paper we introduce Q-Rock, a development cycle for the automated self-exploration and qualification of robot behaviors. With Q-Rock, we suggest a novel, integrative approach to automate robot development processes. Q-Rock combines several machine learning and reasoning techniques to deal with the increasing complexity in the design of robotic systems. The Q-Rock development cycle consists of three complementary processes: (1) automated exploration of capabilities that a given robotic hardware provides, (2) classification and semantic annotation of these capabilities to generate more complex behaviors, and (3) mapping between application requirements and available behaviors. These processes are based on a graph-based representation of a robot's structure, including hardware and software components. A central, scalable knowledge base enables collaboration of robot designers including mechanical, electrical and systems engineers, software developers and machine learning experts. In this paper we formalize Q-Rock's integrative development cycle and highlight its benefits with a proof-of-concept implementation and a use case demonstration.
ROMay 25, 2020
Combinatorics of a Discrete Trajectory Space for Robot Motion PlanningFelix Wiebe, Shivesh Kumar, Daniel Harnack et al.
Motion planning is a difficult problem in robot control. The complexity of the problem is directly related to the dimension of the robot's configuration space. While in many theoretical calculations and practical applications the configuration space is modeled as a continuous space, we present a discrete robot model based on the fundamental hardware specifications of a robot. Using lattice path methods, we provide estimates for the complexity of motion planning by counting the number of possible trajectories in a discrete robot configuration space.
ROMar 19, 2020
A Comparative Study on 2-DOF Variable Stiffness MechanismsChristoph Stoeffler, Shivesh Kumar, Andreas Müller
Based on the idea of variable stiffness mechanisms, a variety of such mechanisms is shown in this work. Specifically, 2-DOF parallel kinematic machines equipped with redundant actuators and non-linear springs in the actuated joints are presented and a comparative overview is drawn. Accordingly, a general stiffness formulation in task space of all mechanisms is given. Under fixed geometric parameters, optimization of task space stiffness is carried out on the designs comprising all kinematic solutions. Finally, a stiffness metric is introduced that allows a quantitative comparison of the given mechanism designs. This gives rise to design guidelines for engineers but also shows an interesting outline for future applications of variable stiffness mechanisms.
ROFeb 29, 2020
Comparison of Distal Teacher Learning with Numerical and Analytical Methods to Solve Inverse Kinematics for Rigid-Body MechanismsTim von Oehsen, Alexander Fabisch, Shivesh Kumar et al.
Several publications are concerned with learning inverse kinematics, however, their evaluation is often limited and none of the proposed methods is of practical relevance for rigid-body kinematics with a known forward model. We argue that for rigid-body kinematics one of the first proposed machine learning (ML) solutions to inverse kinematics -- distal teaching (DT) -- is actually good enough when combined with differentiable programming libraries and we provide an extensive evaluation and comparison to analytical and numerical solutions. In particular, we analyze solve rate, accuracy, sample efficiency and scalability. Further, we study how DT handles joint limits, singularities, unreachable poses, trajectories and provide a comparison of execution times. The three approaches are evaluated on three different rigid body mechanisms with varying complexity. With enough training data and relaxed precision requirements, DT has a better solve rate and is faster than state-of-the-art numerical solvers for a 15-DoF mechanism. DT is not affected by singularities while numerical solutions are vulnerable to them. In all other cases numerical solutions are usually better. Analytical solutions outperform the other approaches by far if they are available.