Konstantinos Karydis

RO
h-index44
31papers
626citations
Novelty49%
AI Score50

31 Papers

CVOct 4, 2022Code
Centroid Distance Keypoint Detector for Colored Point Clouds

Hanzhe Teng, Dimitrios Chatziparaschis, Xinyue Kan et al.

Keypoint detection serves as the basis for many computer vision and robotics applications. Despite the fact that colored point clouds can be readily obtained, most existing keypoint detectors extract only geometry-salient keypoints, which can impede the overall performance of systems that intend to (or have the potential to) leverage color information. To promote advances in such systems, we propose an efficient multi-modal keypoint detector that can extract both geometry-salient and color-salient keypoints in colored point clouds. The proposed CEntroid Distance (CED) keypoint detector comprises an intuitive and effective saliency measure, the centroid distance, that can be used in both 3D space and color space, and a multi-modal non-maximum suppression algorithm that can select keypoints with high saliency in two or more modalities. The proposed saliency measure leverages directly the distribution of points in a local neighborhood and does not require normal estimation or eigenvalue decomposition. We evaluate the proposed method in terms of repeatability and computational efficiency (i.e. running time) against state-of-the-art keypoint detectors on both synthetic and real-world datasets. Results demonstrate that our proposed CED keypoint detector requires minimal computational time while attaining high repeatability. To showcase one of the potential applications of the proposed method, we further investigate the task of colored point cloud registration. Results suggest that our proposed CED detector outperforms state-of-the-art handcrafted and learning-based keypoint detectors in the evaluated scenes. The C++ implementation of the proposed method is made publicly available at https://github.com/UCR-Robotics/CED_Detector.

CVAug 23, 2023Code
SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets

Cody Simons, Dripta S. Raychaudhuri, Sk Miraj Ahmed et al.

Scene understanding using multi-modal data is necessary in many applications, e.g., autonomous navigation. To achieve this in a variety of situations, existing models must be able to adapt to shifting data distributions without arduous data annotation. Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data. Both these assumptions may be problematic for many applications. Source data may not be available due to privacy, security, or economic concerns. Assuming the existence of paired multi-modal data for training also entails significant data collection costs and fails to take advantage of widely available freely distributed pre-trained uni-modal models. In this work, we relax both of these assumptions by addressing the problem of adapting a set of models trained independently on uni-modal data to a target domain consisting of unlabeled multi-modal data, without having access to the original source dataset. Our proposed approach solves this problem through a switching framework which automatically chooses between two complementary methods of cross-modal pseudo-label fusion -- agreement filtering and entropy weighting -- based on the estimated domain gap. We demonstrate our work on the semantic segmentation problem. Experiments across seven challenging adaptation scenarios verify the efficacy of our approach, achieving results comparable to, and in some cases outperforming, methods which assume access to source data. Our method achieves an improvement in mIoU of up to 12% over competing baselines. Our code is publicly available at https://github.com/csimo005/SUMMIT.

ROSep 27, 2023Code
Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree Farms

Hanzhe Teng, Yipeng Wang, Xiaoao Song et al.

In this work we introduce the CitrusFarm dataset, a comprehensive multimodal sensory dataset collected by a wheeled mobile robot operating in agricultural fields. The dataset offers stereo RGB images with depth information, as well as monochrome, near-infrared and thermal images, presenting diverse spectral responses crucial for agricultural research. Furthermore, it provides a range of navigational sensor data encompassing wheel odometry, LiDAR, inertial measurement unit (IMU), and GNSS with Real-Time Kinematic (RTK) as the centimeter-level positioning ground truth. The dataset comprises seven sequences collected in three fields of citrus trees, featuring various tree species at different growth stages, distinctive planting patterns, as well as varying daylight conditions. It spans a total operation time of 1.7 hours, covers a distance of 7.5 km, and constitutes 1.3 TB of data. We anticipate that this dataset can facilitate the development of autonomous robot systems operating in agricultural tree environments, especially for localization, mapping and crop monitoring tasks. Moreover, the rich sensing modalities offered in this dataset can also support research in a range of robotics and computer vision tasks, such as place recognition, scene understanding, object detection and segmentation, and multimodal learning. The dataset, in conjunction with related tools and resources, is made publicly available at https://github.com/UCR-Robotics/Citrus-Farm-Dataset.

CVDec 18, 2025
Visual Alignment of Medical Vision-Language Models for Grounded Radiology Report Generation

Sarosij Bose, Ravi K. Rajendran, Biplob Debnath et al.

Radiology Report Generation (RRG) is a critical step toward automating healthcare workflows, facilitating accurate patient assessments, and reducing the workload of medical professionals. Despite recent progress in Large Medical Vision-Language Models (Med-VLMs), generating radiology reports that are both visually grounded and clinically accurate remains a significant challenge. Existing approaches often rely on large labeled corpora for pre-training, costly task-specific preference data, or retrieval-based methods. However, these strategies do not adequately mitigate hallucinations arising from poor cross-modal alignment between visual and linguistic representations. To address these limitations, we propose VALOR:Visual Alignment of Medical Vision-Language Models for GrOunded Radiology Report Generation. Our method introduces a reinforcement learning-based post-alignment framework utilizing Group-Relative Proximal Optimization (GRPO). The training proceeds in two stages: (1) improving the Med-VLM with textual rewards to encourage clinically precise terminology, and (2) aligning the vision projection module of the textually grounded model with disease findings, thereby guiding attention toward image re gions most relevant to the diagnostic task. Extensive experiments on multiple benchmarks demonstrate that VALOR substantially improves factual accuracy and visual grounding, achieving significant performance gains over state-of-the-art report generation methods.

CVAug 9, 2022
BabyNet: A Lightweight Network for Infant Reaching Action Recognition in Unconstrained Environments to Support Future Pediatric Rehabilitation Applications

Amel Dechemi, Vikarn Bhakri, Ipsita Sahin et al.

Action recognition is an important component to improve autonomy of physical rehabilitation devices, such as wearable robotic exoskeletons. Existing human action recognition algorithms focus on adult applications rather than pediatric ones. In this paper, we introduce BabyNet, a light-weight (in terms of trainable parameters) network structure to recognize infant reaching action from off-body stationary cameras. We develop an annotated dataset that includes diverse reaches performed while in a sitting posture by different infants in unconstrained environments (e.g., in home settings, etc.). Our approach uses the spatial and temporal connection of annotated bounding boxes to interpret onset and offset of reaching, and to detect a complete reaching action. We evaluate the efficiency of our proposed approach and compare its performance against other learning-based network structures in terms of capability of capturing temporal inter-dependencies and accuracy of detection of reaching onset and offset. Results indicate our BabyNet can attain solid performance in terms of (average) testing accuracy that exceeds that of other larger networks, and can hence serve as a light-weight data-driven framework for video-based infant reaching action recognition.

69.1LGMar 30
Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL

Udita Ghosh, Dripta S. Raychaudhuri, Jiachen Li et al.

Preference-based reinforcement learning can learn effective reward functions from comparisons, but its scalability is constrained by the high cost of oracle feedback. Lightweight vision-language embedding (VLE) models provide a cheaper alternative, but their noisy outputs limit their effectiveness as standalone reward generators. To address this challenge, we propose ROVED, a hybrid framework that combines VLE-based supervision with targeted oracle feedback. Our method uses the VLE to generate segment-level preferences and defers to an oracle only for samples with high uncertainty, identified through a filtering mechanism. In addition, we introduce a parameter-efficient fine-tuning method that adapts the VLE with the obtained oracle feedback in order to improve the model over time in a synergistic fashion. This ensures the retention of the scalability of embeddings and the accuracy of oracles, while avoiding their inefficiencies. Across multiple robotic manipulation tasks, ROVED matches or surpasses prior preference-based methods while reducing oracle queries by up to 80%. Remarkably, the adapted VLE generalizes across tasks, yielding cumulative annotation savings of up to 90%, highlighting the practicality of combining scalable embeddings with precise oracle supervision for preference-based RL.

15.3CVMar 20
An Annotation-to-Detection Framework for Autonomous and Robust Vine Trunk Localization in the Field by Mobile Agricultural Robots

Dimitrios Chatziparaschis, Elia Scudiero, Brent Sams et al.

The dynamic and heterogeneous nature of agricultural fields presents significant challenges for object detection and localization, particularly for autonomous mobile robots that are tasked with surveying previously unseen unstructured environments. Concurrently, there is a growing need for real-time detection systems that do not depend on large-scale manually labeled real-world datasets. In this work, we introduce a comprehensive annotation-to-detection framework designed to train a robust multi-modal detector using limited and partially labeled training data. The proposed methodology incorporates cross-modal annotation transfer and an early-stage sensor fusion pipeline, which, in conjunction with a multi-stage detection architecture, effectively trains and enhances the system's multi-modal detection capabilities. The effectiveness of the framework was demonstrated through vine trunk detection in novel vineyard settings that featured diverse lighting conditions and varying crop densities to validate performance. When integrated with a customized multi-modal LiDAR and Odometry Mapping (LOAM) algorithm and a tree association module, the system demonstrated high-performance trunk localization, successfully identifying over 70% of trees in a single traversal with a mean distance error of less than 0.37m. The results reveal that by leveraging multi-modal, incremental-stage annotation and training, the proposed framework achieves robust detection performance regardless of limited starting annotations, showcasing its potential for real-world and near-ground agricultural applications.

ROSep 28, 2024
Language-guided Robust Navigation for Mobile Robots in Dynamically-changing Environments

Cody Simons, Zhichao Liu, Brandon Marcus et al.

In this paper, we develop an embodied AI system for human-in-the-loop navigation with a wheeled mobile robot. We propose a direct yet effective method of monitoring the robot's current plan to detect changes in the environment that impact the intended trajectory of the robot significantly and then query a human for feedback. We also develop a means to parse human feedback expressed in natural language into local navigation waypoints and integrate it into a global planning system, by leveraging a map of semantic features and an aligned obstacle map. Extensive testing in simulation and physical hardware experiments with a resource-constrained wheeled robot tasked to navigate in a real-world environment validate the efficacy and robustness of our method. This work can support applications like precision agriculture and construction, where persistent monitoring of the environment provides a human with information about the environment state.

ROSep 24, 2024
Vision-based Xylem Wetness Classification in Stem Water Potential Determination

Pamodya Peiris, Aritra Samanta, Caio Mucchiani et al.

Water is often overused in irrigation, making efficient management of it crucial. Precision Agriculture emphasizes tools like stem water potential (SWP) analysis for better plant status determination. However, such tools often require labor-intensive in-situ sampling. Automation and machine learning can streamline this process and enhance outcomes. This work focused on automating stem detection and xylem wetness classification using the Scholander Pressure Chamber, a widely used but demanding method for SWP measurement. The aim was to refine stem detection and develop computer-vision-based methods to better classify water emergence at the xylem. To this end, we collected and manually annotated video data, applying vision- and learning-based methods for detection and classification. Additionally, we explored data augmentation and fine-tuned parameters to identify the most effective models. The identified best-performing models for stem detection and xylem wetness classification were evaluated end-to-end over 20 SWP measurements. Learning-based stem detection via YOLOv8n combined with ResNet50-based classification achieved a Top-1 accuracy of 80.98%, making it the best-performing approach for xylem wetness classification.

CVJun 3, 2025Code
BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations

Weiduo Yuan, Jerry Li, Justin Yue et al.

Accurate LiDAR-camera calibration is fundamental to fusing multi-modal perception in autonomous driving and robotic systems. Traditional calibration methods require extensive data collection in controlled environments and cannot compensate for the transformation changes during the vehicle/robot movement. In this paper, we propose the first model that uses bird's-eye view (BEV) features to perform LiDAR camera calibration from raw data, termed BEVCALIB. To achieve this, we extract camera BEV features and LiDAR BEV features separately and fuse them into a shared BEV feature space. To fully utilize the geometric information from the BEV feature, we introduce a novel feature selector to filter the most important features in the transformation decoder, which reduces memory consumption and enables efficient training. Extensive evaluations on KITTI, NuScenes, and our own dataset demonstrate that BEVCALIB establishes a new state of the art. Under various noise conditions, BEVCALIB outperforms the best baseline in the literature by an average of (47.08%, 82.32%) on KITTI dataset, and (78.17%, 68.29%) on NuScenes dataset, in terms of (translation, rotation), respectively. In the open-source domain, it improves the best reproducible baseline by one order of magnitude. Our code and demo results are available at https://cisl.ucr.edu/BEVCalib.

RONov 24, 2021Code
ACD-EDMD: Analytical Construction for Dictionaries of Lifting Functions in Koopman Operator-based Nonlinear Robotic Systems

Lu Shi, Konstantinos Karydis

Koopman operator theory has been gaining momentum for model extraction, planning, and control of data-driven robotic systems. The Koopman operator's ability to extract dynamics from data depends heavily on the selection of an appropriate dictionary of lifting functions. In this paper we propose ACD-EDMD, a new method for Analytical Construction of Dictionaries of appropriate lifting functions for a range of data-driven Koopman operator based nonlinear robotic systems. The key insight of this work is that information about fundamental topological spaces of the nonlinear system (such as its configuration space and workspace) can be exploited to steer the construction of Hermite polynomial-based lifting functions. We show that the proposed method leads to dictionaries that are simple to implement while enjoying provable completeness and convergence guarantees when observables are weighted bounded. We evaluate ACD-EDMD using a range of diverse nonlinear robotic systems in both simulated and physical hardware experimentation (a wheeled mobile robot, a two-revolute-joint robotic arm, and a soft robotic leg). Results reveal that our method leads to dictionaries that enable high-accuracy prediction and that can generalize to diverse validation sets. The associated GitHub repository of our algorithm can be accessed at \url{https://github.com/UCR-Robotics/ACD-EDMD}.

ROMar 1, 2019Code
OpenRoACH: A Durable Open-Source Hexapedal Platform with Onboard Robot Operating System (ROS)

Liyu Wang, Yuxiang Yang, Gustavo Correa et al.

OpenRoACH is a 15-cm 200-gram self-contained hexapedal robot with an onboard single-board computer. To our knowledge, it is the smallest legged robot with the capability of running the Robot Operating System (ROS) onboard. The robot is fully open sourced, uses accessible materials and off-the-shelf electronic components, can be fabricated with benchtop fast-prototyping machines such as a laser cutter and a 3D printer, and can be assembled by one person within two hours. Its sensory capacity has been tested with gyroscopes, accelerometers, Beacon sensors, color vision sensors, linescan sensors and cameras. It is low-cost within \$150 including structure materials, motors, electronics, and a battery. The capabilities of OpenRoACH are demonstrated with multi-surface walking and running, 24-hour continuous walking burn-ins, carrying 200-gram dynamic payloads and 800-gram static payloads, and ROS control of steering based on camera feedback. Information and files related to mechanical design, fabrication, assembly, electronics, and control algorithms are all publicly available on https://wiki.eecs.berkeley.edu/biomimetics/Main/OpenRoACH.

CVApr 8, 2025
Leveraging Synthetic Adult Datasets for Unsupervised Infant Pose Estimation

Sarosij Bose, Hannah Dela Cruz, Arindam Dutta et al.

Human pose estimation is a critical tool across a variety of healthcare applications. Despite significant progress in pose estimation algorithms targeting adults, such developments for infants remain limited. Existing algorithms for infant pose estimation, despite achieving commendable performance, depend on fully supervised approaches that require large amounts of labeled data. These algorithms also struggle with poor generalizability under distribution shifts. To address these challenges, we introduce SHIFT: Leveraging SyntHetic Adult Datasets for Unsupervised InFanT Pose Estimation, which leverages the pseudo-labeling-based Mean-Teacher framework to compensate for the lack of labeled data and addresses distribution shifts by enforcing consistency between the student and the teacher pseudo-labels. Additionally, to penalize implausible predictions obtained from the mean-teacher framework, we incorporate an infant manifold pose prior. To enhance SHIFT's self-occlusion perception ability, we propose a novel visibility consistency module for improved alignment of the predicted poses with the original image. Extensive experiments on multiple benchmarks show that SHIFT significantly outperforms existing state-of-the-art unsupervised domain adaptation (UDA) pose estimation methods by 5% and supervised infant pose estimation methods by a margin of 16%. The project page is available at: https://sarosijbose.github.io/SHIFT.

CVMar 19, 2025
Uncertainty-Aware Diffusion Guided Refinement of 3D Scenes

Sarosij Bose, Arindam Dutta, Sayak Nag et al.

Reconstructing 3D scenes from a single image is a fundamentally ill-posed task due to the severely under-constrained nature of the problem. Consequently, when the scene is rendered from novel camera views, existing single image to 3D reconstruction methods render incoherent and blurry views. This problem is exacerbated when the unseen regions are far away from the input camera. In this work, we address these inherent limitations in existing single image-to-3D scene feedforward networks. To alleviate the poor performance due to insufficient information beyond the input image's view, we leverage a strong generative prior in the form of a pre-trained latent video diffusion model, for iterative refinement of a coarse scene represented by optimizable Gaussian parameters. To ensure that the style and texture of the generated images align with that of the input image, we incorporate on-the-fly Fourier-style transfer between the generated images and the input image. Additionally, we design a semantic uncertainty quantification module that calculates the per-pixel entropy and yields uncertainty maps used to guide the refinement process from the most confident pixels while discarding the remaining highly uncertain ones. We conduct extensive experiments on real-world scene datasets, including in-domain RealEstate-10K and out-of-domain KITTI-v2, showing that our approach can provide more realistic and high-fidelity novel view synthesis results compared to existing state-of-the-art methods.

LGFeb 3, 2025
Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning

Udita Ghosh, Dripta S. Raychaudhuri, Jiachen Li et al.

Preference-based reinforcement learning (RL) offers a promising approach for aligning policies with human intent but is often constrained by the high cost of human feedback. In this work, we introduce PrefVLM, a framework that integrates Vision-Language Models (VLMs) with selective human feedback to significantly reduce annotation requirements while maintaining performance. Our method leverages VLMs to generate initial preference labels, which are then filtered to identify uncertain cases for targeted human annotation. Additionally, we adapt VLMs using a self-supervised inverse dynamics loss to improve alignment with evolving policies. Experiments on Meta-World manipulation tasks demonstrate that PrefVLM achieves comparable or superior success rates to state-of-the-art methods while using up to 2 x fewer human annotations. Furthermore, we show that adapted VLMs enable efficient knowledge transfer across tasks, further minimizing feedback needs. Our results highlight the potential of combining VLMs with selective human supervision to make preference-based RL more scalable and practical.

CVJan 6, 2025
Unsupervised Domain Adaptation for Occlusion Resilient Human Pose Estimation

Arindam Dutta, Sarosij Bose, Saketh Bachu et al.

Occlusions are a significant challenge to human pose estimation algorithms, often resulting in inaccurate and anatomically implausible poses. Although current occlusion-robust human pose estimation algorithms exhibit impressive performance on existing datasets, their success is largely attributed to supervised training and the availability of additional information, such as multiple views or temporal continuity. Furthermore, these algorithms typically suffer from performance degradation under distribution shifts. While existing domain adaptive human pose estimation algorithms address this bottleneck, they tend to perform suboptimally when the target domain images are occluded, a common occurrence in real-life scenarios. To address these challenges, we propose OR-POSE: Unsupervised Domain Adaptation for Occlusion Resilient Human POSE Estimation. OR-POSE is an innovative unsupervised domain adaptation algorithm which effectively mitigates domain shifts and overcomes occlusion challenges by employing the mean teacher framework for iterative pseudo-label refinement. Additionally, OR-POSE reinforces realistic pose prediction by leveraging a learned human pose prior which incorporates the anatomical constraints of humans in the adaptation process. Lastly, OR-POSE avoids overfitting to inaccurate pseudo labels generated from heavily occluded images by employing a novel visibility-based curriculum learning approach. This enables the model to gradually transition from training samples with relatively less occlusion to more challenging, heavily occluded samples. Extensive experiments show that OR-POSE outperforms existing analogous state-of-the-art algorithms by $\sim$ 7% on challenging occluded human pose estimation datasets.

ROSep 1, 2021
Modeling and Trajectory Optimization for Standing Long Jumping of a Quadruped with A Preloaded Elastic Prismatic Spine

Keran Ye, Konstantinos Karydis

This paper presents a novel methodology to model and optimize trajectories of a quadrupedal robot with spinal compliance to improve standing jump performance compared to quadrupeds with a rigid spine. We introduce an elastic model for a prismatic robotic spine that is actively preloaded and mechanically lock-enabled at initial and maximum length, and develop a constrained trajectory optimization method to co-optimize the elastic parameters and motion trajectories toward enhanced jumping distance. Results reveal that a less stiff spring is likely to facilitate jumping performance not as a direct propelling source but as a means to unleash more motor power for propelling by trading-off overall energy efficiency. We also visualize the impact of spring coefficients on the overall optimization routine from energetic perspectives to identify the suitable parameter region.

ROAug 4, 2021
Deformation Recovery Control and Post-Impact Trajectory Replanning for Collision-Resilient Mobile Robots

Zhouyu Lu, Zhichao Liu, Konstantinos Karydis

The paper focuses on collision-inclusive motion planning for impact-resilient mobile robots. We propose a new deformation recovery and replanning strategy to handle collisions that may occur at run-time. Contrary to collision avoidance methods that generate trajectories only in conservative local space or require collision checking that has high computational cost, our method directly generates (local) trajectories with imposing only waypoint constraints. If a collision occurs, our method then estimates the post-impact state and computes from there an intermediate waypoint to recover from the collision. To achieve so, we develop two novel components: 1) a deformation recovery controller that optimizes the robot's states during post-impact recovery phase, and 2) a post-impact trajectory replanner that adjusts the next waypoint with the information from the collision for the robot to pass through and generates a polynomial-based minimum effort trajectory. The proposed strategy is evaluated experimentally with an omni-directional impact-resilient wheeled robot. The robot is designed in house, and it can perceive collisions with the aid of Hall effect sensors embodied between the robot's main chassis and a surrounding deflection ring-like structure.

ROAug 3, 2021
Position Control and Variable-Height Trajectory Tracking of a Soft Pneumatic Legged Robot

Zhichao Liu, Konstantinos Karydis

Soft pneumatic legged robots show promise in their ability to traverse a range of different types of terrain, including natural unstructured terrain met in applications like precision agriculture. They can adapt their body morphology to the intricacies of the terrain at hand, thus enabling robust and resilient locomotion. In this paper we capitalize upon recent developments on soft pneumatic legged robots to introduce a closed-loop trajectory tracking control scheme for operation over flat ground. Closed-loop pneumatic actuation feedback is achieved via a compact and portable pneumatic regulation board. Experimental results reveal that our soft legged robot can precisely control its body height and orientation while in quasi-static operation based on a geometric model. The robot can track both straight line and curved trajectories as well as variable-height trajectories. This work lays the basis to enable autonomous navigation for soft legged robots.

ROJul 20, 2021
A Portable Agricultural Robot for Continuous Apparent Soil ElectricalConductivity Measurements to Improve Irrigation Practices

Merrick Campbell, Keran Ye, Elia Scudiero et al.

Near-ground sensing data, such as geospatial measurements of soil apparent electrical conductivity (ECa), are used in precision agriculture to improve farming practices and increase crop yield. Near-ground sensors provide valuable information, yet, the process of collecting, assessing, and interpreting measurements requires significant human labor. Automating parts of this process via the use of mobile robots can help decrease labor burden, and increase the accuracy and frequency of data collections, and overall increase the adoption and use of ECa measurement technology. This paper introduces a roboticized means to autonomously perform geospatial ECa measurements and map soil moisture content in micro-irrigated orchard systems. We retrofit a small wheeled mobile robot with a small electromagnetic induction sensor by studying and taking into consideration the effect of the robot body to the sensor's readings, and develop a software stack to enable autonomous logging of geo-referenced measurements. The proposed roboticized ECa measurement method is evaluated by mapping a 50m x 30m field against the baseline of human-conducted measurements obtained by walking the sensor in the same field and following the same path. Experimental testing reveals that our approach yields roboticized measurements comparable to human-conducted ones, despite the robot's small form factor.

ROMar 1, 2021
Enhancement for Robustness of Koopman Operator-based Data-driven Mobile Robotic Systems

Lu Shi, Konstantinos Karydis

Koopman operator theory has served as the basis to extract dynamics for nonlinear system modeling and control across settings, including non-holonomic mobile robot control. There is a growing interest in research to derive robustness (and/or safety) guarantees for systems the dynamics of which are extracted via the Koopman operator. In this paper, we propose a way to quantify the prediction error because of noisy measurements when the Koopman operator is approximated via Extended Dynamic Mode Decomposition. We further develop an enhanced robot control strategy to endow robustness to a class of data-driven (robotic) systems that rely on Koopman operator theory, and we show how part of the strategy can happen offline in an effort to make our algorithm capable of real-time implementation. We perform a parametric study to evaluate the (theoretical) performance of the algorithm using a Van der Pol oscillator and conduct a series of simulated experiments in Gazebo using a non-holonomic wheeled robot.

ROFeb 3, 2021
Task Planning on Stochastic Aisle Graphs for Precision Agriculture

Xinyue Kan, Thomas C. Thayer, Stefano Carpin et al.

This work addresses task planning under uncertainty for precision agriculture applications whereby task costs are uncertain and the gain of completing a task is proportional to resource consumption (such as water consumption in precision irrigation). The goal is to complete all tasks while prioritizing those that are more urgent, and subject to diverse budget thresholds and stochastic costs for tasks. To describe agriculture-related environments that incorporate stochastic costs to complete tasks, a new Stochastic-Vertex-Cost Aisle Graph (SAG) is introduced. Then, a task allocation algorithm, termed Next-Best-Action Planning (NBA-P), is proposed. NBA-P utilizes the underlying structure enabled by SAG, and tackles the task planning problem by simultaneously determining the optimal tasks to perform and an optimal time to exit (i.e. return to a base station), at run-time. The proposed approach is tested with both simulated data and real-world experimental datasets collected in a commercial vineyard, in both single- and multi-robot scenarios. In all cases, NBA-P outperforms other evaluated methods in terms of return per visited vertex, wasted resources resulting from aborted tasks (i.e. when a budget threshold is exceeded), and total visited vertices.

RONov 3, 2020
Toward Impact-resilient Quadrotor Design, Collision Characterization and Recovery Control to Sustain Flight after Collisions

Zhichao Liu, Konstantinos Karydis

Collision detection and recovery for aerial robots remain a challenge because of the limited space for sensors and local stability of the flight controller. We introduce a novel collision-resilient quadrotor that features a compliant arm design to enable free flight while allowing for one passive degree of freedom to absorb shocks. We further propose a novel collision detection and characterization method based on Hall sensors, as well as a new recovery control method to generate and track a smooth trajectory after a collision occurs. Experimental results demonstrate that the robot can detect and recover from high-speed collisions with various obstacles such as walls and poles. Moreover, it can survive collisions that are hard to detect with existing methods based on IMU data and contact models, for example, when colliding with unstructured surfaces, or being hit by a moving obstacle while hovering.

ROSep 4, 2020
Motion Planning for Collision-resilient Mobile Robots in Obstacle-cluttered Unknown Environments with Risk Reward Trade-offs

Zhouyu Lu, Zhichao Liu, Gustavo J. Correa et al.

Collision avoidance in unknown obstacle-cluttered environments may not always be feasible. This paper focuses on an emerging paradigm shift in which potential collisions with the environment can be harnessed instead of being avoided altogether. To this end, we introduce a new sampling-based online planning algorithm that can explicitly handle the risk of colliding with the environment and can switch between collision avoidance and collision exploitation. Central to the planner's capabilities is a novel joint optimization function that evaluates the effect of possible collisions using a reflection model. This way, the planner can make deliberate decisions to collide with the environment if such collision is expected to help the robot make progress toward its goal. To make the algorithm online, we present a state expansion pruning technique that significantly reduces the search space while ensuring completeness. The proposed algorithm is evaluated experimentally with a built-in-house holonomic wheeled robot that can withstand collisions. We perform an extensive parametric study to investigate trade-offs between (user-tuned) levels of risk, deliberate collision decision making, and trajectory statistics such as time to reach the goal and path length.

ROAug 29, 2020
Development and Testing of a Novel Automated Insect Capture Module for Sample Collection and Transfer

Keran Ye, Gustavo J. Correa, Tom Guda et al.

There exists an urgent need for efficient tools in disease surveillance to help model and predict the spread of disease. The transmission of insect-borne diseases poses a serious concern to public health officials and the medical and research community at large. In the modeling of this spread, we face bottlenecks in (1) the frequency at which we are able to sample insect vectors in environments that are prone to propagating disease, (2) manual labor needed to set up and retrieve surveillance devices like traps, and (3) the return time in analyzing insect samples and determining if an infectious disease is spreading in a region. To help address these bottlenecks, we present in this paper the design, fabrication, and testing of a novel automated insect capture module (ICM) or trap that aims to improve the rate of transferring samples collected from the environment via aerial robots. The ICM features an ultraviolet light attractant, passive capture mechanism, panels which can open and close for access to insects, and a small onboard computer for automated operation and data logging. At the same time, the ICM is designed to be accessible; it is small-scale, lightweight and low-cost, and can be integrated with commercially available aerial robots. Indoor and outdoor experimentation validates ICM's feasibility in insect capturing and safe transportation. The device can help bring us one step closer toward achieving fully autonomous and scalable epidemiology by leveraging autonomous robots technology to aid the medical and research community.

ROJun 30, 2020
Online Exploration and Coverage Planning in Unknown Obstacle-Cluttered Environments

Xinyue Kan, Hanzhe Teng, Konstantinos Karydis

Online coverage planning can be useful in applications like field monitoring and search and rescue. Without prior information of the environment, achieving resolution-complete coverage considering the non-holonomic mobility constraints in commonly-used vehicles (e.g., wheeled robots) remains a challenge. In this paper, we propose a hierarchical, hex-decomposition-based coverage planning algorithm for unknown, obstacle-cluttered environments. The proposed approach ensures resolution-complete coverage, can be tuned to achieve fast exploration, and plans smooth paths for Dubins vehicles to follow at constant velocity in real-time. Gazebo simulations and hardware experiments with a non-holonomic wheeled robot show that our approach can successfully tradeoff between coverage and exploration speed and can outperform existing online coverage algorithms in terms of total covered area or exploration speed according to how it is tuned.

RODec 6, 2017
Fast, Autonomous Flight in GPS-Denied and Cluttered Environments

Kartik Mohta, Michael Watterson, Yash Mulgaonkar et al.

One of the most challenging tasks for a flying robot is to autonomously navigate between target locations quickly and reliably while avoiding obstacles in its path, and with little to no a-priori knowledge of the operating environment. This challenge is addressed in the present paper. We describe the system design and software architecture of our proposed solution, and showcase how all the distinct components can be integrated to enable smooth robot operation. We provide critical insight on hardware and software component selection and development, and present results from extensive experimental testing in real-world warehouse environments. Experimental testing reveals that our proposed solution can deliver fast and robust aerial robot autonomous navigation in cluttered, GPS-denied environments.

AISep 17, 2017
Memory Augmented Control Networks

Arbaaz Khan, Clark Zhang, Nikolay Atanasov et al.

Planning problems in partially observable environments cannot be solved directly with convolutional networks and require some form of memory. But, even memory networks with sophisticated addressing schemes are unable to learn intelligent reasoning satisfactorily due to the complexity of simultaneously learning to access memory and plan. To mitigate these challenges we introduce the Memory Augmented Control Network (MACN). The proposed network architecture consists of three main parts. The first part uses convolutions to extract features and the second part uses a neural network-based planning module to pre-plan in the environment. The third part uses a network controller that learns to store those specific instances of past information that are necessary for planning. The performance of the network is evaluated in discrete grid world environments for path planning in the presence of simple and complex obstacles. We show that our network learns to plan and can generalize to new environments.

ROJul 24, 2017
End-to-End Navigation in Unknown Environments using Neural Networks

Arbaaz Khan, Clark Zhang, Nikolay Atanasov et al.

We investigate how a neural network can learn perception actions loops for navigation in unknown environments. Specifically, we consider how to learn to navigate in environments populated with cul-de-sacs that represent convex local minima that the robot could fall into instead of finding a set of feasible actions that take it to the goal. Traditional methods rely on maintaining a global map to solve the problem of over coming a long cul-de-sac. However, due to errors induced from local and global drift, it is highly challenging to maintain such a map for long periods of time. One way to mitigate this problem is by using learning techniques that do not rely on hand engineered map representations and instead output appropriate control policies directly from their sensory input. We first demonstrate that such a problem cannot be solved directly by deep reinforcement learning due to the sparse reward structure of the environment. Further, we demonstrate that deep supervised learning also cannot be used directly to solve this problem. We then investigate network models that offer a combination of reinforcement learning and supervised learning and highlight the significance of adding fully differentiable memory units to such networks. We evaluate our networks on their ability to generalize to new environments and show that adding memory to such networks offers huge jumps in performance

ROMay 23, 2017
Neural Network Memory Architectures for Autonomous Robot Navigation

Steven W Chen, Nikolay Atanasov, Arbaaz Khan et al.

This paper highlights the significance of including memory structures in neural networks when the latter are used to learn perception-action loops for autonomous robot navigation. Traditional navigation approaches rely on global maps of the environment to overcome cul-de-sacs and plan feasible motions. Yet, maintaining an accurate global map may be challenging in real-world settings. A possible way to mitigate this limitation is to use learning techniques that forgo hand-engineered map representations and infer appropriate control responses directly from sensed information. An important but unexplored aspect of such approaches is the effect of memory on their performance. This work is a first thorough study of memory structures for deep-neural-network-based robot navigation, and offers novel tools to train such networks from supervision and quantify their ability to generalize to unseen scenarios. We analyze the separation and generalization abilities of feedforward, long short-term memory, and differentiable neural computer networks. We introduce a new method to evaluate the generalization ability by estimating the VC-dimension of networks with a final linear readout layer. We validate that the VC estimates are good predictors of actual test performance. The reported method can be applied to deep learning problems beyond robotics.

ROOct 5, 2012
Symbolic Planning and Control Using Game Theory and Grammatical Inference

Jie Fu, Herbert G. Tanner, Jeffrey Heinz et al.

This paper presents an approach that brings together game theory with grammatical inference and discrete abstractions in order to synthesize control strategies for hybrid dynamical systems performing tasks in partially unknown but rule-governed adversarial environments. The combined formulation guarantees that a system specification is met if (a) the true model of the environment is in the class of models inferable from a positive presentation, (b) a characteristic sample is observed, and (c) the task specification is satisfiable given the capabilities of the system (agent) and the environment.