JoonHo Lee

RO
h-index45
28papers
6,344citations
Novelty53%
AI Score59

28 Papers

ROMar 23, 2022
Advanced Skills through Multiple Adversarial Motion Priors in Reinforcement Learning

Eric Vollenweider, Marko Bjelonic, Victor Klemm et al. · eth-zurich

In recent years, reinforcement learning (RL) has shown outstanding performance for locomotion control of highly articulated robotic systems. Such approaches typically involve tedious reward function tuning to achieve the desired motion style. Imitation learning approaches such as adversarial motion priors aim to reduce this problem by encouraging a pre-defined motion style. In this work, we present an approach to augment the concept of adversarial motion prior-based RL to allow for multiple, discretely switchable styles. We show that multiple styles and skills can be learned simultaneously without notable performance differences, even in combination with motion data-free skills. Our approach is validated in several real-world experiments with a wheeled-legged quadruped robot showing skills learned from existing RL controllers and trajectory optimization, such as ducking and walking, and novel skills such as switching between a quadrupedal and humanoid configuration. For the latter skill, the robot is required to stand up, navigate on two wheels, and sit down. Instead of tuning the sit-down motion, we verify that a reverse playback of the stand-up movement helps the robot discover feasible sit-down behaviors and avoids tedious reward function tuning.

ROOct 6, 2022
Meta Reinforcement Learning for Optimal Design of Legged Robots

Álvaro Belmonte-Baeza, Joonho Lee, Giorgio Valsecchi et al. · eth-zurich

The process of robot design is a complex task and the majority of design decisions are still based on human intuition or tedious manual tuning. A more informed way of facing this task is computational design methods where design parameters are concurrently optimized with corresponding controllers. Existing approaches, however, are strongly influenced by predefined control rules or motion templates and cannot provide end-to-end solutions. In this paper, we present a design optimization framework using model-free meta reinforcement learning, and its application to the optimizing kinematics and actuator parameters of quadrupedal robots. We use meta reinforcement learning to train a locomotion policy that can quickly adapt to different designs. This policy is used to evaluate each design instance during the design optimization. We demonstrate that the policy can control robots of different designs to track random velocity commands over various rough terrains. With controlled experiments, we show that the meta policy achieves close-to-optimal performance for each design instance after adaptation. Lastly, we compare our results against a model-based baseline and show that our approach allows higher performance while not being constrained by predefined motions or gait patterns.

CVSep 24, 2023Code
LiDAR-UDA: Self-ensembling Through Time for Unsupervised LiDAR Domain Adaptation

Amirreza Shaban, JoonHo Lee, Sanghun Jung et al.

We introduce LiDAR-UDA, a novel two-stage self-training-based Unsupervised Domain Adaptation (UDA) method for LiDAR segmentation. Existing self-training methods use a model trained on labeled source data to generate pseudo labels for target data and refine the predictions via fine-tuning the network on the pseudo labels. These methods suffer from domain shifts caused by different LiDAR sensor configurations in the source and target domains. We propose two techniques to reduce sensor discrepancy and improve pseudo label quality: 1) LiDAR beam subsampling, which simulates different LiDAR scanning patterns by randomly dropping beams; 2) cross-frame ensembling, which exploits temporal consistency of consecutive frames to generate more reliable pseudo labels. Our method is simple, generalizable, and does not incur any extra inference cost. We evaluate our method on several public LiDAR datasets and show that it outperforms the state-of-the-art methods by more than $3.9\%$ mIoU on average for all scenarios. Code will be available at https://github.com/JHLee0513/LiDARUDA.

CVAug 31, 2022
Feature Alignment by Uncertainty and Self-Training for Source-Free Unsupervised Domain Adaptation

JoonHo Lee, Gyemin Lee

Most unsupervised domain adaptation (UDA) methods assume that labeled source images are available during model adaptation. However, this assumption is often infeasible owing to confidentiality issues or memory constraints on mobile devices. Some recently developed approaches do not require source images during adaptation, but they show limited performance on perturbed images. To address these problems, we propose a novel source-free UDA method that uses only a pre-trained source model and unlabeled target images. Our method captures the aleatoric uncertainty by incorporating data augmentation and trains the feature generator with two consistency objectives. The feature generator is encouraged to learn consistent visual features away from the decision boundaries of the head classifier. Thus, the adapted model becomes more robust to image perturbations. Inspired by self-supervised learning, our method promotes inter-space alignment between the prediction space and the feature space while incorporating intra-space consistency within the feature space to reduce the domain gap between the source and target domains. We also consider epistemic uncertainty to boost the model adaptation performance. Extensive experiments on popular UDA benchmark datasets demonstrate that the proposed source-free method is comparable or even superior to vanilla UDA methods. Moreover, the adapted models show more robust results when input images are perturbed.

CVNov 16, 2022
Unsupervised Domain Adaptation Based on the Predictive Uncertainty of Models

JoonHo Lee, Gyemin Lee

Unsupervised domain adaptation (UDA) aims to improve the prediction performance in the target domain under distribution shifts from the source domain. The key principle of UDA is to minimize the divergence between the source and the target domains. To follow this principle, many methods employ a domain discriminator to match the feature distributions. Some recent methods evaluate the discrepancy between two predictions on target samples to detect those that deviate from the source distribution. However, their performance is limited because they either match the marginal distributions or measure the divergence conservatively. In this paper, we present a novel UDA method that learns domain-invariant features that minimize the domain divergence. We propose model uncertainty as a measure of the domain divergence. Our UDA method based on model uncertainty (MUDA) adopts a Bayesian framework and provides an efficient way to evaluate model uncertainty by means of Monte Carlo dropout sampling. Empirical results on image recognition tasks show that our method is superior to existing state-of-the-art methods. We also extend MUDA to multi-source domain adaptation problems.

CLAug 23, 2024Code
Preference Consistency Matters: Enhancing Preference Learning in Language Models with Automated Self-Curation of Training Corpora

JoonHo Lee, JuYoun Son, Juree Seok et al.

Inconsistent annotations in training corpora, particularly within preference learning datasets, pose challenges in developing advanced language models. These inconsistencies often arise from variability among annotators and inherent multi-dimensional nature of the preferences. To address these issues, we introduce a self-curation method that preprocesses annotated datasets by leveraging proxy models trained directly on them. Our method enhances preference learning by automatically detecting and selecting consistent annotations. We validate the proposed approach through extensive instruction-following tasks, demonstrating performance improvements of up to 33\% across various learning algorithms and proxy capabilities. This work offers a straightforward and reliable solution to address preference inconsistencies without relying on heuristics, serving as an initial step toward the development of more advanced preference learning methodologies. Code is available at https://github.com/Self-Curation/ .

CVJul 19, 2023
Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples

JoonHo Lee, Jae Oh Woo, Hankyu Moon et al.

Deploying deep visual models can lead to performance drops due to the discrepancies between source and target distributions. Several approaches leverage labeled source data to estimate target domain accuracy, but accessing labeled source data is often prohibitively difficult due to data confidentiality or resource limitations on serving devices. Our work proposes a new framework to estimate model accuracy on unlabeled target data without access to source data. We investigate the feasibility of using pseudo-labels for accuracy estimation and evolve this idea into adopting recent advances in source-free domain adaptation algorithms. Our approach measures the disagreement rate between the source hypothesis and the target pseudo-labeling function, adapted from the source hypothesis. We mitigate the impact of erroneous pseudo-labels that may arise due to a high ideal joint hypothesis risk by employing adaptive adversarial perturbation on the input of the target model. Our proposed source-free framework effectively addresses the challenging distribution shift scenarios and outperforms existing methods requiring source data and labels for training.

ROApr 24
Learning-augmented robotic automation for real-world manufacturing

Yunho Kim, Quan Nguyen, Taewhan Kim et al.

Industrial robots are widely used in manufacturing, yet most manipulation still depends on fixed waypoint scripts that are brittle to environmental changes. Learning-based control offers a more adaptive alternative, but it remains unclear whether such methods, still mostly confined to laboratory demonstrations, can sustain hours of reliable operation, deliver consistent quality, and behave safely around people on a live production line. Here we present Learning-Augmented Robotic Automation, a hybrid system that integrates learned task controllers and a neural 3D safety monitor into conventional industrial workflows. We deployed the system on an electric-motor production line to automate deformable cable insertion and soldering under real manufacturing constraints, a step previously performed manually by human workers. With less than 20 min of real-world data per task, the system operated continuously for 5 h 10 min, producing 108 motors without physical fencing and achieving a 99.4% pass rate on product-level quality-control tests. It maintained near-human takt time while reducing variability in solder-joint quality and cycle time. These results establish a practical pathway for extending industrial automation with learning-based methods.

CVOct 2, 2025Code
Exploring OCR-augmented Generation for Bilingual VQA

JoonHo Lee, Sunho Park

We investigate OCR-augmented generation with Vision Language Models (VLMs), exploring tasks in Korean and English toward multilingualism. To support research in this domain, we train and release KLOCR, a strong bilingual OCR baseline trained on 100M instances to augment VLMs with OCR ability. To complement existing VQA benchmarks, we curate KOCRBench for Korean VQA, and analyze different prompting methods. Extensive experiments show that OCR-extracted text significantly boosts performance across open source and commercial models. Our work offers new insights into OCR-augmented generation for bilingual VQA. Model, code, and data are available at https://github.com/JHLee0513/KLOCR.

ROMay 3, 2024
Learning Robust Autonomous Navigation and Locomotion for Wheeled-Legged Robots

Joonho Lee, Marko Bjelonic, Alexander Reske et al.

Autonomous wheeled-legged robots have the potential to transform logistics systems, improving operational efficiency and adaptability in urban environments. Navigating urban environments, however, poses unique challenges for robots, necessitating innovative solutions for locomotion and navigation. These challenges include the need for adaptive locomotion across varied terrains and the ability to navigate efficiently around complex dynamic obstacles. This work introduces a fully integrated system comprising adaptive locomotion control, mobility-aware local navigation planning, and large-scale path planning within the city. Using model-free reinforcement learning (RL) techniques and privileged learning, we develop a versatile locomotion controller. This controller achieves efficient and robust locomotion over various rough terrains, facilitated by smooth transitions between walking and driving modes. It is tightly integrated with a learned navigation controller through a hierarchical RL framework, enabling effective navigation through challenging terrain and various obstacles at high speed. Our controllers are integrated into a large-scale urban navigation system and validated by autonomous, kilometer-scale navigation missions conducted in Zurich, Switzerland, and Seville, Spain. These missions demonstrate the system's robustness and adaptability, underscoring the importance of integrated control systems in achieving seamless navigation in complex environments. Our findings support the feasibility of wheeled-legged robots and hierarchical RL for autonomous navigation, with implications for last-mile delivery and beyond.

ROMay 21, 2024
Rethinking Robustness Assessment: Adversarial Attacks on Learning-based Quadrupedal Locomotion Controllers

Fan Shi, Chong Zhang, Takahiro Miki et al.

Legged locomotion has recently achieved remarkable success with the progress of machine learning techniques, especially deep reinforcement learning (RL). Controllers employing neural networks have demonstrated empirical and qualitative robustness against real-world uncertainties, including sensor noise and external perturbations. However, formally investigating the vulnerabilities of these locomotion controllers remains a challenge. This difficulty arises from the requirement to pinpoint vulnerabilities across a long-tailed distribution within a high-dimensional, temporally sequential space. As a first step towards quantitative verification, we propose a computational method that leverages sequential adversarial attacks to identify weaknesses in learned locomotion controllers. Our research demonstrates that, even state-of-the-art robust controllers can fail significantly under well-designed, low-magnitude adversarial sequence. Through experiments in simulation and on the real robot, we validate our approach's effectiveness, and we illustrate how the results it generates can be used to robustify the original policy and offer valuable insights into the safety of these black-box policies. Project page: https://fanshi14.github.io/me/rss24.html

ROApr 24
A Kinematic Analysis of Palm Degrees of Freedom for Enhancing Thumb Opposability in Robotic Hands

HyoJae Kang, Yeong Jae Park, Hyunmok Jung et al.

This study investigates the kinematic role of palm degrees of freedom (DoF) in enhancing thumb opposability in a five-finger robotic hand. A hand model consisting of a five DoF thumb and four fingers with three to four DoF is analyzed, where palm motion is introduced between adjacent fingers. To quantitatively evaluate thumb-finger interaction, the overlap workspace volume is defined based on voxelized fingertip reachable regions. Seven cases are considered, including configurations with increased total DoF and configurations in which the total DoF is maintained by redistributing DoF from the fingers to the palm. The results show that palm DoF significantly improves opposability, particularly for the ring and little fingers, by repositioning their base locations rather than simply extending their reachable range. However, when the total DoF is constrained, redistributing DoF to the palm leads to trade-offs between overlap workspace expansion and kinematic redundancy. These findings indicate that palm DoF and finger DoF play distinct roles in hand kinematics and should be considered jointly in design. This study provides a quantitative framework for evaluating palm-induced opposability without relying on object or contact models and offers practical design guidelines for incorporating palm motion in robotic hands.

ROApr 22
Kinematic Optimization of Phalanx Length Ratios in Robotic Hands Using Potential Dexterity

HyoJae Kang, Joonho Lee, Jeongdo Ahn et al.

In the design stage of robotic hands, it is not straightforward to quantitatively evaluate the effect of phalanx length ratios on dexterity without defining specific objects or manipulation tasks. Therefore, this study presents a framework for optimizing the phalanx length ratios of a five-finger robotic hand based on potential dexterity within a kinematic structure. The proposed method employs global manipulability, workspace volume, overlap workspace volume, and fingertip sensitivity as evaluation metrics, and identifies optimal design configurations using a weighted objective function under given constraints. The reachable workspace is discretized using a voxel-based representation, and joint motions are discretized at uniform intervals for evaluation. The optimization is performed over design sets for both the thumb and the other fingers, and design combinations that do not generate overlap workspace are excluded. The results show that each phalanx does not contribute equally to the overall dexterity, and the factors influencing each phalanx are identified. In addition, it is observed that the selection of weighting coefficients does not necessarily lead to the direct maximization of individual performance metrics, due to the non-uniform distribution of evaluation measures within the design space. The proposed framework provides a systematic approach to analyze the trade-offs among reachability, dexterity, and controllability, and can serve as a practical guideline for the kinematic design of multi-fingered robotic hands.

CVFeb 11, 2025
KPIs 2024 Challenge: Advancing Glomerular Segmentation from Patch- to Slide-Level

Ruining Deng, Tianyuan Yao, Yucheng Tang et al.

Chronic kidney disease (CKD) is a major global health issue, affecting over 10% of the population and causing significant mortality. While kidney biopsy remains the gold standard for CKD diagnosis and treatment, the lack of comprehensive benchmarks for kidney pathology segmentation hinders progress in the field. To address this, we organized the Kidney Pathology Image Segmentation (KPIs) Challenge, introducing a dataset that incorporates preclinical rodent models of CKD with over 10,000 annotated glomeruli from 60+ Periodic Acid Schiff (PAS)-stained whole slide images. The challenge includes two tasks, patch-level segmentation and whole slide image segmentation and detection, evaluated using the Dice Similarity Coefficient (DSC) and F1-score. By encouraging innovative segmentation methods that adapt to diverse CKD models and tissue conditions, the KPIs Challenge aims to advance kidney pathology analysis, establish new benchmarks, and enable precise, large-scale quantification for disease research and diagnosis.

ROApr 22
A Kinematic Framework for Evaluating Pinch Configurations in Robotic Hand Design without Object or Contact Models

HyoJae Kang, Joonho Lee, Hyunmok Jung et al.

Evaluating the pinch capability of a robotic hand is important for understanding its functional dexterity. However, many existing grasp evaluation methods rely on object geometry or contact force models, which limits their applicability during the early stages of robotic hand design. This study proposes a kinematic evaluation method for analyzing pinch configurations of robotic hands based on interactions between fingertip workspaces. First, the reachable workspace of each fingertip is computed from the joint configurations of the fingers. Then, feasible pinch configurations are detected by evaluating the relationships between fingertip pairs. Since the proposed method does not require information about object geometry or contact force models, the pinch capability of a robotic hand can be evaluated solely based on its kinematic structure. In addition, analyses are performed on four different kinematic structures of the hand to investigate their impact on the pinch configurations. The proposed evaluation framework can serve as a useful tool for comparing different robotic hand designs and analyzing pinch capability during the design stage.

CLMay 10, 2024
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation

JoonHo Lee, Jae Oh Woo, Juree Seok et al.

Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for the quality of paired responses based on Bayesian approximation. Trained with preference datasets, our uncertainty-enabled proxy not only scores rewards for responses but also evaluates their inherent uncertainty. Empirical results demonstrate significant benefits of incorporating the proposed proxy into language model training. Our method boosts the instruction following capability of language models by refining data curation for training and improving policy optimization objectives, thereby surpassing existing methods by a large margin on benchmarks such as Vicuna and MT-bench. These findings highlight that our proposed approach substantially advances language model training and paves a new way of harnessing uncertainty within language models.

CLNov 16, 2025
SGuard-v1: Safety Guardrail for Large Language Models

JoonHo Lee, HyeonMin Cho, Jaewoong Yun et al.

We present SGuard-v1, a lightweight safety guardrail for Large Language Models (LLMs), which comprises two specialized models to detect harmful content and screen adversarial prompts in human-AI conversational settings. The first component, ContentFilter, is trained to identify safety risks in LLM prompts and responses in accordance with the MLCommons hazard taxonomy, a comprehensive framework for trust and safety assessment of AI. The second component, JailbreakFilter, is trained with a carefully designed curriculum over integrated datasets and findings from prior work on adversarial prompting, covering 60 major attack types while mitigating false-unsafe classification. SGuard-v1 is built on the 2B-parameter Granite-3.3-2B-Instruct model that supports 12 languages. We curate approximately 1.4 million training instances from both collected and synthesized data and perform instruction tuning on the base model, distributing the curated data across the two component according to their designated functions. Through extensive evaluation on public and proprietary safety benchmarks, SGuard-v1 achieves state-of-the-art safety performance while remaining lightweight, thereby reducing deployment overhead. SGuard-v1 also improves interpretability for downstream use by providing multi-class safety predictions and their binary confidence scores. We release the SGuard-v1 under the Apache-2.0 License to enable further research and practical deployment in AI safety.

IVJun 10, 2024
Assessing the risk of recurrence in early-stage breast cancer through H&E stained whole slide images

Geongyu Lee, Joonho Lee, Tae-Yeong Kwak et al.

Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology.We analyzed 125 hematoxylin and eosin-stained whole slide images (WSIs) from 125 patients across two institutions (National Cancer Center and Korea University Medical Center Guro Hospital) to predict breast cancer recurrence risk using deep learning. Sensitivity reached 0.857, 0.746, and 0.529 for low, intermediate, and high-risk categories, respectively, with specificity of 0.816, 0.803, and 0.972, and a Pearson correlation of 0.61 with histological grade. Class activation maps highlighted features like tubule formation and mitotic rate, suggesting a cost-effective approach to risk stratification, pending broader validation. These findings suggest that deep learning models trained exclusively on hematoxylin and eosin stained whole slide images can approximate genomic assay results, offering a cost-effective and scalable tool for breast cancer recurrence risk assessment. However, further validation using larger and more balanced datasets is needed to confirm the clinical applicability of our approach.

ROJan 20, 2022
Learning robust perceptive locomotion for quadrupedal robots in the wild

Takahiro Miki, Joonho Lee, Jemin Hwangbo et al.

Legged robots that can operate autonomously in remote and hazardous environments will greatly increase opportunities for exploration into under-explored areas. Exteroceptive perception is crucial for fast and energy-efficient locomotion: perceiving the terrain before making contact with it enables planning and adaptation of the gait ahead of time to maintain speed and stability. However, utilizing exteroceptive perception robustly for locomotion has remained a grand challenge in robotics. Snow, vegetation, and water visually appear as obstacles on which the robot cannot step~-- or are missing altogether due to high reflectance. Additionally, depth perception can degrade due to difficult lighting, dust, fog, reflective or transparent surfaces, sensor occlusion, and more. For this reason, the most robust and general solutions to legged locomotion to date rely solely on proprioception. This severely limits locomotion speed, because the robot has to physically feel out the terrain before adapting its gait accordingly. Here we present a robust and general solution to integrating exteroceptive and proprioceptive perception for legged locomotion. We leverage an attention-based recurrent encoder that integrates proprioceptive and exteroceptive input. The encoder is trained end-to-end and learns to seamlessly combine the different perception modalities without resorting to heuristics. The result is a legged locomotion controller with high robustness and speed. The controller was tested in a variety of challenging natural and urban environments over multiple seasons and completed an hour-long hike in the Alps in the time recommended for human hikers.

ROJan 18, 2022
CERBERUS: Autonomous Legged and Aerial Robotic Exploration in the Tunnel and Urban Circuits of the DARPA Subterranean Challenge

Marco Tranzatto, Frank Mascarich, Lukas Bernreiter et al.

Autonomous exploration of subterranean environments constitutes a major frontier for robotic systems as underground settings present key challenges that can render robot autonomy hard to achieve. This has motivated the DARPA Subterranean Challenge, where teams of robots search for objects of interest in various underground environments. In response, the CERBERUS system-of-systems is presented as a unified strategy towards subterranean exploration using legged and flying robots. As primary robots, ANYmal quadruped systems are deployed considering their endurance and potential to traverse challenging terrain. For aerial robots, both conventional and collision-tolerant multirotors are utilized to explore spaces too narrow or otherwise unreachable by ground systems. Anticipating degraded sensing conditions, a complementary multi-modal sensor fusion approach utilizing camera, LiDAR, and inertial data for resilient robot pose estimation is proposed. Individual robot pose estimates are refined by a centralized multi-robot map optimization approach to improve the reported location accuracy of detected objects of interest in the DARPA-defined coordinate frame. Furthermore, a unified exploration path planning policy is presented to facilitate the autonomous operation of both legged and aerial robots in complex underground networks. Finally, to enable communication between the robots and the base station, CERBERUS utilizes a ground rover with a high-gain antenna and an optical fiber connection to the base station, alongside breadcrumbing of wireless nodes by our legged robots. We report results from the CERBERUS system-of-systems deployment at the DARPA Subterranean Challenge Tunnel and Urban Circuits, along with the current limitations and the lessons learned for the benefit of the community.

ROJan 11, 2022
Combining Learning-based Locomotion Policy with Model-based Manipulation for Legged Mobile Manipulators

Yuntao Ma, Farbod Farshidian, Takahiro Miki et al.

Deep reinforcement learning produces robust locomotion policies for legged robots over challenging terrains. To date, few studies have leveraged model-based methods to combine these locomotion skills with the precise control of manipulators. Here, we incorporate external dynamics plans into learning-based locomotion policies for mobile manipulation. We train the base policy by applying a random wrench sequence on the robot base in simulation and adding the noisified wrench sequence prediction to the policy observations. The policy then learns to counteract the partially-known future disturbance. The random wrench sequences are replaced with the wrench prediction generated with the dynamics plans from model predictive control to enable deployment. We show zero-shot adaptation for manipulators unseen during training. On the hardware, we demonstrate stable locomotion of legged robots with the prediction of the external wrench.

RONov 17, 2020
Circus ANYmal: A Quadruped Learning Dexterous Manipulation with Its Limbs

Fan Shi, Timon Homberger, Joonho Lee et al.

Quadrupedal robots are skillful at locomotion tasks while lacking manipulation skills, not to mention dexterous manipulation abilities. Inspired by the animal behavior and the duality between multi-legged locomotion and multi-fingered manipulation, we showcase a circus ball challenge on a quadrupedal robot, ANYmal. We employ a model-free reinforcement learning approach to train a deep policy that enables the robot to balance and manipulate a light-weight ball robustly using its limbs without any contact measurement sensor. The policy is trained in the simulation, in which we randomize many physical properties with additive noise and inject random disturbance force during manipulation, and achieves zero-shot deployment on the real robot without any adjustment. In the hardware experiments, dynamic performance is achieved with a maximum rotation speed of 15 deg/s, and robust recovery is showcased under external poking. To our best knowledge, it is the first work that demonstrates the dexterous dynamic manipulation on a real quadrupedal robot.

ROOct 21, 2020
Learning Quadrupedal Locomotion over Challenging Terrain

Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen et al.

Some of the most challenging environments on our planet are accessible to quadrupedal animals but remain out of reach for autonomous machines. Legged locomotion can dramatically expand the operational domains of robotics. However, conventional controllers for legged locomotion are based on elaborate state machines that explicitly trigger the execution of motion primitives and reflexes. These designs have escalated in complexity while falling short of the generality and robustness of animal locomotion. Here we present a radically robust controller for legged locomotion in challenging natural environments. We present a novel solution to incorporating proprioceptive feedback in locomotion control and demonstrate remarkable zero-shot generalization from simulation to natural environments. The controller is trained by reinforcement learning in simulation. It is based on a neural network that acts on a stream of proprioceptive signals. The trained controller has taken two generations of quadrupedal ANYmal robots to a variety of natural environments that are beyond the reach of prior published work in legged locomotion. The controller retains its robustness under conditions that have never been encountered during training: deformable terrain such as mud and snow, dynamic footholds such as rubble, and overground impediments such as thick vegetation and gushing water. The presented work opens new frontiers for robotics and indicates that radical robustness in natural environments can be achieved by training in much simpler domains.

CVOct 8, 2019
REFUGE Challenge: A Unified Framework for Evaluating Automated Methods for Glaucoma Assessment from Fundus Photographs

José Ignacio Orlando, Huazhu Fu, João Barbossa Breda et al.

Glaucoma is one of the leading causes of irreversible but preventable blindness in working age populations. Color fundus photography (CFP) is the most cost-effective imaging modality to screen for retinal disorders. However, its application to glaucoma has been limited to the computation of a few related biomarkers such as the vertical cup-to-disc ratio. Deep learning approaches, although widely applied for medical image analysis, have not been extensively used for glaucoma assessment due to the limited size of the available data sets. Furthermore, the lack of a standardize benchmark strategy makes difficult to compare existing methods in a uniform way. In order to overcome these issues we set up the Retinal Fundus Glaucoma Challenge, REFUGE (\url{https://refuge.grand-challenge.org}), held in conjunction with MICCAI 2018. The challenge consisted of two primary tasks, namely optic disc/cup segmentation and glaucoma classification. As part of REFUGE, we have publicly released a data set of 1200 fundus images with ground truth segmentations and clinical glaucoma labels, currently the largest existing one. We have also built an evaluation framework to ease and ensure fairness in the comparison of different models, encouraging the development of novel techniques in the field. 12 teams qualified and participated in the online challenge. This paper summarizes their methods and analyzes their corresponding results. In particular, we observed that two of the top-ranked teams outperformed two human experts in the glaucoma classification task. Furthermore, the segmentation results were in general consistent with the ground truth annotations, with complementary outcomes that can be further exploited by ensembling the results.

ROSep 18, 2019
DeepGait: Planning and Control of Quadrupedal Gaits using Deep Reinforcement Learning

Vassilios Tsounis, Mitja Alge, Joonho Lee et al.

This paper addresses the problem of legged locomotion in non-flat terrain. As legged robots such as quadrupeds are to be deployed in terrains with geometries which are difficult to model and predict, the need arises to equip them with the capability to generalize well to unforeseen situations. In this work, we propose a novel technique for training neural-network policies for terrain-aware locomotion, which combines state-of-the-art methods for model-based motion planning and reinforcement learning. Our approach is centered on formulating Markov decision processes using the evaluation of dynamic feasibility criteria in place of physical simulation. We thus employ policy-gradient methods to independently train policies which respectively plan and execute foothold and base motions in 3D environments using both proprioceptive and exteroceptive measurements. We apply our method within a challenging suite of simulated terrain scenarios which contain features such as narrow bridges, gaps and stepping-stones, and train policies which succeed in locomoting effectively in all cases.

LGMay 26, 2019
ProbAct: A Probabilistic Activation Function for Deep Neural Networks

Kumar Shridhar, Joonho Lee, Hideaki Hayashi et al.

Activation functions play an important role in training artificial neural networks. The majority of currently used activation functions are deterministic in nature, with their fixed input-output relationship. In this work, we propose a novel probabilistic activation function, called ProbAct. ProbAct is decomposed into a mean and variance and the output value is sampled from the formed distribution, making ProbAct a stochastic activation function. The values of mean and variances can be fixed using known functions or trained for each element. In the trainable ProbAct, the mean and the variance of the activation distribution is trained within the back-propagation framework alongside other parameters. We show that the stochastic perturbation induced through ProbAct acts as a viable generalization technique for feature augmentation. In our experiments, we compare ProbAct with well-known activation functions on classification tasks on different modalities: Images(CIFAR-10, CIFAR-100, and STL-10) and Text (Large Movie Review). We show that ProbAct increases the classification accuracy by +2-3% compared to ReLU or other conventional activation functions on both original datasets and when datasets are reduced to 50% and 25% of the original size. Finally, we show that ProbAct learns an ensemble of models by itself that can be used to estimate the uncertainties associated with the prediction and provides robustness to noisy inputs.

ROJan 24, 2019
Learning agile and dynamic motor skills for legged robots

Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy et al.

Legged robots pose one of the greatest challenges in robotics. Dynamic and agile maneuvers of animals cannot be imitated by existing methods that are crafted by humans. A compelling alternative is reinforcement learning, which requires minimal craftsmanship and promotes the natural evolution of a control policy. However, so far, reinforcement learning research for legged robots is mainly limited to simulation, and only few and comparably simple examples have been deployed on real systems. The primary reason is that training with real robots, particularly with dynamically balancing systems, is complicated and expensive. In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes. The approach is applied to the ANYmal robot, a sophisticated medium-dog-sized quadrupedal system. Using policies trained in simulation, the quadrupedal machine achieves locomotion skills that go beyond what had been achieved with prior methods: ANYmal is capable of precisely and energy-efficiently following high-level body velocity commands, running faster than before, and recovering from falling even in complex configurations.

ROJan 22, 2019
Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning

Joonho Lee, Jemin Hwangbo, Marco Hutter

The ability to recover from a fall is an essential feature for a legged robot to navigate in challenging environments robustly. Until today, there has been very little progress on this topic. Current solutions mostly build upon (heuristically) predefined trajectories, resulting in unnatural behaviors and requiring considerable effort in engineering system-specific components. In this paper, we present an approach based on model-free Deep Reinforcement Learning (RL) to control recovery maneuvers of quadrupedal robots using a hierarchical behavior-based controller. The controller consists of four neural network policies including three behaviors and one behavior selector to coordinate them. Each of them is trained individually in simulation and deployed directly on a real system. We experimentally validate our approach on the quadrupedal robot ANYmal, which is a dog-sized quadrupedal system with 12 degrees of freedom. With our method, ANYmal manifests dynamic and reactive recovery behaviors to recover from an arbitrary fall configuration within less than 5 seconds. We tested the recovery maneuver more than 100 times, and the success rate was higher than 97 %.