CVMay 3, 2025
PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering ApproachNitin Rai, Arnold W. Schumann, Nathan Boyd
Collecting large-scale crop disease images in the field is labor-intensive and time-consuming. Generative models (GMs) offer an alternative by creating synthetic samples that resemble real-world images. However, existing research primarily relies on Generative Adversarial Networks (GANs)-based image-to-image translation and lack a comprehensive analysis of computational requirements in agriculture. Therefore, this research explores a multi-modal text-to-image approach for generating synthetic crop disease images and is the first to provide computational benchmarking in this context. We trained three Stable Diffusion (SD) variants-SDXL, SD3.5M (medium), and SD3.5L (large)-and fine-tuned them using Dreambooth and Low-Rank Adaptation (LoRA) fine-tuning techniques to enhance generalization. SD3.5M outperformed the others, with an average memory usage of 18 GB, power consumption of 180 W, and total energy use of 1.02 kWh/500 images (0.002 kWh per image) during inference task. Our results demonstrate SD3.5M's ability to generate 500 synthetic images from just 36 in-field samples in 1.5 hours. We recommend SD3.5M for efficient crop disease data generation.
RODec 13, 2021
Learning Generalizable Vision-Tactile Robotic Grasping Strategy for Deformable Objects via TransformerYunhai Han, Kelin Yu, Rahul Batra et al.
Reliable robotic grasping, especially with deformable objects such as fruits, remains a challenging task due to underactuated contact interactions with a gripper, unknown object dynamics and geometries. In this study, we propose a Transformer-based robotic grasping framework for rigid grippers that leverage tactile and visual information for safe object grasping. Specifically, the Transformer models learn physical feature embeddings with sensor feedback through performing two pre-defined explorative actions (pinching and sliding) and predict a grasping outcome through a multilayer perceptron (MLP) with a given grasping strength. Using these predictions, the gripper predicts a safe grasping strength via inference. Compared with convolutional recurrent networks, the Transformer models can capture the long-term dependencies across the image sequences and process spatial-temporal features simultaneously. We first benchmark the Transformer models on a public dataset for slip detection. Following that, we show that the Transformer models outperform a CNN+LSTM model in terms of grasping accuracy and computational efficiency. We also collect a new fruit grasping dataset and conduct online grasping experiments using the proposed framework for both seen and unseen fruits. {In addition, we extend our model to objects with different shapes and demonstrate the effectiveness of our pre-trained model trained on our large-scale fruit dataset. Our codes and dataset are public on GitHub.
ROOct 6, 2021
Reactive Locomotion Decision-Making and Robust Motion Planning for Real-Time Perturbation RecoveryZhaoyuan Gu, Nathan Boyd, Ye Zhao
In this paper, we examine the problem of push recovery for bipedal robot locomotion and present a reactive decision-making and robust planning framework for locomotion resilient to external perturbations. Rejecting perturbations is an essential capability of bipedal robots and has been widely studied in the locomotion literature. However, adversarial disturbances and aggressive turning can lead to negative lateral step width (i.e., crossed-leg scenarios) with unstable motions and self-collision risks. These motion planning problems are computationally difficult and have not been explored under a hierarchically integrated task and motion planning method. We explore a planning and decision-making framework that closely ties linear-temporal-logic-based reactive synthesis with trajectory optimization incorporating the robot's full-body dynamics, kinematics, and leg collision avoidance constraints. Between the high-level discrete symbolic decision-making and the low-level continuous motion planning, behavior trees serve as a reactive interface to handle perturbations occurring at any time of the locomotion process. Our experimental results show the efficacy of our method in generating resilient recovery behaviors in response to diverse perturbations from any direction with bounded magnitudes.
ROJul 26, 2021
Terrain-perception-free Quadrupedal Spinning Locomotion on Versatile Terrains: Modeling, Analysis, and Experimental ValidationHongwu Zhu, Dong Wang, Nathan Boyd et al.
Dynamic quadrupedal locomotion over rough terrains reveals remarkable progress over the last few decades. Small-scale quadruped robots are adequately flexible and adaptable to traverse uneven terrains along sagittal direction, such as slopes and stairs. To accomplish autonomous locomotion navigation in complex environments, spinning is a fundamental yet indispensable functionality for legged robots. However, spinning behaviors of quadruped robots on uneven terrain often exhibit position drifts. Motivated by this problem, this study presents an algorithmic method to enable accurate spinning motions over uneven terrain and constrain the spinning radius of the Center of Mass (CoM) to be bounded within a small range to minimize the drift risks. A modified spherical foot kinematics representation is proposed to improve the foot kinematic model and rolling dynamics of the quadruped during locomotion. A CoM planner is proposed to generate stable spinning motion based on projected stability margins. Accurate motion tracking is accomplished with Linear Quadratic Regulator (LQR) to bound the position drift during the spinning movement. Experiments are conducted on a small-scale quadruped robot and the effectiveness of the proposed method is verified on versatile terrains including flat ground, stairs and slopes.