ROMay 30
Coarse-to-Fine Compositional Diffusion for Long-Horizon PlanningByoungwoo Park, Utkarsh A. Mishra, Jaemoo Choi et al.
Diffusion models provide strong priors for generating structured data, but many tasks require outputs beyond the scale on which these models are typically trained. Compositional generation addresses this by composing overlapping local plans from a pretrained short-horizon prior into a long-horizon output. However, standard composition primarily enforces agreement between neighboring local plans, yielding local consistency without directly specifying the global structure of the full composition. As a result, locally compatible plans may still form an implausible route, task sequence, or temporal evolution. Existing methods improve global coherence by repeatedly propagating local consistency signals or by adding inference-time optimization, but these procedures become expensive as the number or dimensionality of local plans increases. We propose Coarse-to-Fine Compositional Diffusion (CoFi), an inference-time sampler that separates global structure formation from local detail refinement. CoFi first aligns local denoised estimates around a shared coarse structure, producing a global scaffold that captures the long-range task-level arrangement. It then diffuses this scaffold to an intermediate noise level and denoises it with the same pretrained local prior, restoring local fine structure while preserving the scaffold-induced global coherence. Across long-horizon robotic planning, panoramic image generation, and long video generation, CoFi not only improves both global coherence and local sample quality over prior compositional baselines, but also requires 2-8x fewer denoiser evaluations.
ROFeb 28, 2023
ReorientDiff: Diffusion Model based Reorientation for Object ManipulationUtkarsh A. Mishra, Yongxin Chen
The ability to manipulate objects in a desired configurations is a fundamental requirement for robots to complete various practical applications. While certain goals can be achieved by picking and placing the objects of interest directly, object reorientation is needed for precise placement in most of the tasks. In such scenarios, the object must be reoriented and re-positioned into intermediate poses that facilitate accurate placement at the target pose. To this end, we propose a reorientation planning method, ReorientDiff, that utilizes a diffusion model-based approach. The proposed method employs both visual inputs from the scene, and goal-specific language prompts to plan intermediate reorientation poses. Specifically, the scene and language-task information are mapped into a joint scene-task representation feature space, which is subsequently leveraged to condition the diffusion model. The diffusion model samples intermediate poses based on the representation using classifier-free guidance and then uses gradients of learned feasibility-score models for implicit iterative pose-refinement. The proposed method is evaluated using a set of YCB-objects and a suction gripper, demonstrating a success rate of 95.2% in simulation. Overall, our study presents a promising approach to address the reorientation challenge in manipulation by learning a conditional distribution, which is an effective way to move towards more generalizable object manipulation. For more results, checkout our website: https://utkarshmishra04.github.io/ReorientDiff.
ROMar 7, 2025
Generative Trajectory Stitching through Diffusion CompositionYunhao Luo, Utkarsh A. Mishra, Yilun Du et al.
Effective trajectory stitching for long-horizon planning is a significant challenge in robotic decision-making. While diffusion models have shown promise in planning, they are limited to solving tasks similar to those seen in their training data. We propose CompDiffuser, a novel generative approach that can solve new tasks by learning to compositionally stitch together shorter trajectory chunks from previously seen tasks. Our key insight is modeling the trajectory distribution by subdividing it into overlapping chunks and learning their conditional relationships through a single bidirectional diffusion model. This allows information to propagate between segments during generation, ensuring physically consistent connections. We conduct experiments on benchmark tasks of various difficulties, covering different environment sizes, agent state dimension, trajectory types, training data quality, and show that CompDiffuser significantly outperforms existing methods.
LGNov 15, 2021
Learning Representations for Pixel-based Control: What Matters and Why?Manan Tomar, Utkarsh A. Mishra, Amy Zhang et al.
Learning representations for pixel-based control has garnered significant attention recently in reinforcement learning. A wide range of methods have been proposed to enable efficient learning, leading to sample complexities similar to those in the full state setting. However, moving beyond carefully curated pixel data sets (centered crop, appropriate lighting, clear background, etc.) remains challenging. In this paper, we adopt a more difficult setting, incorporating background distractors, as a first step towards addressing this challenge. We present a simple baseline approach that can learn meaningful representations with no metric-based learning, no data augmentations, no world-model learning, and no contrastive learning. We then analyze when and why previously proposed methods are likely to fail or reduce to the same performance as the baseline in this harder setting and why we should think carefully about extending such methods beyond the well curated environments. Our results show that finer categorization of benchmarks on the basis of characteristics like density of reward, planning horizon of the problem, presence of task-irrelevant components, etc., is crucial in evaluating algorithms. Based on these observations, we propose different metrics to consider when evaluating an algorithm on benchmark tasks. We hope such a data-centric view can motivate researchers to rethink representation learning when investigating how to best apply RL to real-world tasks.
RONov 4, 2021
Dynamic Mirror Descent based Model Predictive Control for Accelerating Robot LearningUtkarsh A. Mishra, Soumya R. Samineni, Prakhar Goel et al.
Recent works in Reinforcement Learning (RL) combine model-free (Mf)-RL algorithms with model-based (Mb)-RL approaches to get the best from both: asymptotic performance of Mf-RL and high sample-efficiency of Mb-RL. Inspired by these works, we propose a hierarchical framework that integrates online learning for the Mb-trajectory optimization with off-policy methods for the Mf-RL. In particular, two loops are proposed, where the Dynamic Mirror Descent based Model Predictive Control (DMD-MPC) is used as the inner loop Mb-RL to obtain an optimal sequence of actions. These actions are in turn used to significantly accelerate the outer loop Mf-RL. We show that our formulation is generic for a broad class of MPC-based policies and objectives, and includes some of the well-known Mb-Mf approaches. We finally introduce a new algorithm: Mirror-Descent Model Predictive RL (M-DeMoRL), which uses Cross-Entropy Method (CEM) with elite fractions for the inner loop. Our experiments show faster convergence of the proposed hierarchical approach on benchmark MuJoCo tasks. We also demonstrate hardware training for trajectory tracking in a 2R leg and hardware transfer for robust walking in a quadruped. We show that the inner-loop Mb-RL significantly decreases the number of training iterations required in the real system, thereby validating the proposed approach.
ROSep 26, 2021
Linear Policies are Sufficient to Realize Robust Bipedal Walking on Challenging TerrainsLokesh Krishna, Guillermo A. Castillo, Utkarsh A. Mishra et al.
In this work, we demonstrate robust walking in the bipedal robot Digit on uneven terrains by just learning a single linear policy. In particular, we propose a new control pipeline, wherein the high-level trajectory modulator shapes the end-foot ellipsoidal trajectories, and the low-level gait controller regulates the torso and ankle orientation. The foot-trajectory modulator uses a linear policy and the regulator uses a linear PD control law. As opposed to neural network-based policies, the proposed linear policy has only 13 learnable parameters, thereby not only guaranteeing sample efficient learning but also enabling simplicity and interpretability of the policy. This is achieved with no loss of performance on challenging terrains like slopes, stairs and outdoor landscapes. We first demonstrate robust walking in the custom simulation environment, MuJoCo, and then directly transfer to hardware with no modification of the control pipeline. We subject the biped to a series of pushes and terrain height changes, both indoors and outdoors, thereby validating the presented work.
ROMay 15, 2021
Learning Control Policies for Imitating Human GaitsUtkarsh A. Mishra
The work presented in this report introduces a framework aimed towards learning to imitate human gaits. Humans exhibit movements like walking, running, and jumping in the most efficient manner, which served as the source of motivation for this project. Skeletal and Musculoskeletal human models were considered for motions in the sagittal plane, and results from both were compared exhaustively. While skeletal models are driven with motor actuation, musculoskeletal models perform through muscle-tendon actuation. Model-free reinforcement learning algorithms were used to optimize inverse dynamics control actions to satisfy the objective of imitating a reference motion along with secondary objectives of minimizing effort in terms of power spent by motors and metabolic energy consumed by the muscles. On the one hand, the control actions for the motor actuated model is the target joint angles converted into joint torques through a Proportional-Differential controller. While on the other hand, the control actions for the muscle-tendon actuated model is the muscle excitations converted implicitly to muscle activations and then to muscle forces which apply moments on joints. Muscle-tendon actuated models were found to have superiority over motor actuation as they are inherently smooth due to muscle activation dynamics and don't need any external regularizers. Finally, a strategy that was used to obtain an optimal configuration of the significant decision variables in the framework was discussed. All the results and analysis are presented in an illustrative, qualitative, and quantitative manner. Supporting video links are provided in the Appendix.
ROApr 4, 2021
Learning Linear Policies for Robust Bipedal Locomotion on Terrains with Varying SlopesLokesh Krishna, Utkarsh A. Mishra, Guillermo A. Castillo et al.
In this paper, with a view toward deployment of light-weight control frameworks for bipedal walking robots, we realize end-foot trajectories that are shaped by a single linear feedback policy. We learn this policy via a model-free and a gradient-free learning algorithm, Augmented Random Search (ARS), in the two robot platforms Rabbit and Digit. Our contributions are two-fold: a) By using torso and support plane orientation as inputs, we achieve robust walking on slopes of up to 20 degrees in simulation. b) We demonstrate additional behaviors like walking backwards, stepping-in-place, and recovery from external pushes of up to 120 N. The end result is a robust and a fast feedback control law for bipedal walking on terrains with varying slopes. Towards the end, we also provide preliminary results of hardware transfer to Digit.
RODec 2, 2020
Planning Brachistochrone Hip Trajectory for a Toe-Foot Bipedal Robot going DownstairsGaurav Bhardwaj, Utkarsh A. Mishra, N. Sukavanam et al.
A novel efficient downstairs trajectory is proposed for a 9 link biped robot model with toe-foot. Brachistochrone is the fastest descent trajectory for a particle moving only under the influence of gravity. In most situations, while climbing downstairs, human hip also follow brachistochrone trajectory for a more responsive motion. Here, an adaptive trajectory planning algorithm is developed so that biped robots of varying link lengths, masses can climb down on varying staircase dimensions. We assume that the center of gravity (COG) of the biped concerned lies on the hip. Zero Moment Point (ZMP) based COG trajectory is considered and its stability is ensured. Cycloidal trajectory is considered for ankle of the swing leg. Parameters of both cycloid and brachistochrone depends on dimensions of staircase steps. Hence this paper can be broadly divided into 4 steps 1) Developing ZMP based brachistochrone trajectory for hip 2) Cycloidal trajectory planning for ankle by taking proper collision constraints 3) Solving Inverse kinematics using unsupervised artificial neural network (ANN) 4) Comparison between the proposed, a circular arc and a virtual slope based hip trajectory. The proposed algorithms have been implemented using MATLAB.
RODec 2, 2020
Cycloidal Trajectory Realization on Staircase based on Neural Network Temporal Quantized Lagrange Dynamics (NNTQLD) with Ant Colony Optimization for a 9-Link Bipedal RobotGaurav Bhardwaj, Utkarsh A. Mishra, N. Sukavanam et al.
In this paper, a novel optimal technique for joint angles trajectory tracking control with energy optimization for a biped robot with toe foot is proposed. For the task of climbing stairs by a 9-link biped model, a cycloid trajectory for swing phase is proposed in such a way that the cycloid variables depend on the staircase dimensions. Zero Moment Point(ZMP) criteria is taken for satisfying stability constraint. This paper mainly can be divided into 3 steps: 1) Planning stable cycloid trajectory for initial step and subsequent step for climbing upstairs and Inverse Kinematics using an unsupervised artificial neural network with knot shifting procedure for jerk minimization. 2) Modeling Dynamics for Toe foot biped model using Lagrange Dynamics along with contact modeling using spring-damper system followed by developing Neural Network Temporal Quantized Lagrange Dynamics which takes inverse kinematics output from neural network as its inputs. 3) Using Ant Colony Optimization to tune PD (Proportional Derivative) controller parameters and torso angle with the objective to minimize joint space trajectory errors and total energy consumed. Three cases with variable staircase dimensions have been taken and a brief comparison is done to verify the effectiveness of our proposed work Generated patterns have been simulated in MATLAB .