Shatadal Mishra

RO
h-index8
6papers
3citations
Novelty53%
AI Score40

6 Papers

ROMay 15, 2025Code
IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning

Dechen Gao, Hang Wang, Hanchu Zhou et al.

Imitation learning (IL) and reinforcement learning (RL) each offer distinct advantages for robotics policy learning: IL provides stable learning from demonstrations, and RL promotes generalization through exploration. While existing robot learning approaches using IL-based pre-training followed by RL-based fine-tuning are promising, this two-step learning paradigm often suffers from instability and poor sample efficiency during the RL fine-tuning phase. In this work, we introduce IN-RIL, INterleaved Reinforcement learning and Imitation Learning, for policy fine-tuning, which periodically injects IL updates after multiple RL updates and hence can benefit from the stability of IL and the guidance of expert data for more efficient exploration throughout the entire fine-tuning process. Since IL and RL involve different optimization objectives, we develop gradient separation mechanisms to prevent destructive interference during \ABBR fine-tuning, by separating possibly conflicting gradient updates in orthogonal subspaces. Furthermore, we conduct rigorous analysis, and our findings shed light on why interleaving IL with RL stabilizes learning and improves sample-efficiency. Extensive experiments on 14 robot manipulation and locomotion tasks across 3 benchmarks, including FurnitureBench, OpenAI Gym, and Robomimic, demonstrate that \ABBR can significantly improve sample efficiency and mitigate performance collapse during online finetuning in both long- and short-horizon tasks with either sparse or dense rewards. IN-RIL, as a general plug-in compatible with various state-of-the-art RL algorithms, can significantly improve RL fine-tuning, e.g., from 12\% to 88\% with 6.3x improvement in the success rate on Robomimic Transport. Project page: https://github.com/ucd-dare/IN-RIL.

28.2ROMay 11
Network-Efficient World Model Token Streaming

Shatadal Mishra, Ahmadreza Moradipari, Nejib Ammar

Generative driving world models rely on compact latent state representations that must be efficiently transmitted and synchronized across distributed compute and connected vehicles. We study network-efficient streaming of a discrete world model state, where a stride-16 VQ-U-Net tokenizer (codebook size 8,192) maps each 288x512 frame to an 18x32 grid of token IDs (576 tokens/frame), equivalent to 936 bytes/frame under fixed-length coding. We consider a keyframe--delta protocol under strict per-message payload budgets and packet loss, and propose a fully online, label-free algorithm that prioritizes delta updates via cosine distance in codebook embedding space and triggers keyframes adaptively using a Hamming-drift threshold. The adaptive algorithm consistently improves the rate distortion frontier over periodic keyframes at matched bitrates: at 0.024 Mb/s (200-byte budget) dynamic-only embedding distortion drops from 0.0712 to 0.0661 (7.2\%), and at 0.036 Mb/s (400-byte budget) from 0.0427 to 0.0407 (4.8\%). Under 10\% delta packet loss at 200 bytes, dynamic-only distortion is 0.0757 versus 0.0789 for a matched periodic baseline. To connect state fidelity to world model usefulness, we train a lightweight next-token predictor and evaluate perplexity conditioned on streamed receiver states: at 0.024 Mb/s, dynamic-position perplexity improves from 206.0 to 193.1 (6.3\%), and at 0.036 Mb/s from 158.9 to 155.6 (2.1\%). These results support discrete token-state streaming as a practical systems layer for bandwidth-aware synchronization and improved downstream token-dynamics utility under vehicular networking constraints.

SYJun 24, 2024
Tolerance of Reinforcement Learning Controllers against Deviations in Cyber Physical Systems

Changjian Zhang, Parv Kapoor, Eunsuk Kang et al.

Cyber-physical systems (CPS) with reinforcement learning (RL)-based controllers are increasingly being deployed in complex physical environments such as autonomous vehicles, the Internet-of-Things(IoT), and smart cities. An important property of a CPS is tolerance; i.e., its ability to function safely under possible disturbances and uncertainties in the actual operation. In this paper, we introduce a new, expressive notion of tolerance that describes how well a controller is capable of satisfying a desired system requirement, specified using Signal Temporal Logic (STL), under possible deviations in the system. Based on this definition, we propose a novel analysis problem, called the tolerance falsification problem, which involves finding small deviations that result in a violation of the given requirement. We present a novel, two-layer simulation-based analysis framework and a novel search heuristic for finding small tolerance violations. To evaluate our approach, we construct a set of benchmark problems where system parameters can be configured to represent different types of uncertainties and disturbancesin the system. Our evaluation shows that our falsification approach and heuristic can effectively find small tolerance violations.

ROFeb 7, 2022
Probabilistic Consensus on Feature Distribution for Multi-robot Systems with Markovian Exploration Dynamics

Aniket Shirsat, Shatadal Mishra, Wenlong Zhang et al.

In this paper, we present a consensus-based decentralized multi-robot approach to reconstruct a discrete distribution of features, modeled as an occupancy grid map, that represent information contained in a bounded planar 2D environment, such as visual cues used for navigation or semantic labels associated with object detection. The robots explore the environment according to a random walk modeled by a discrete-time discrete-state (DTDS) Markov chain and estimate the feature distribution from their own measurements and the estimates communicated by neighboring robots, using a distributed Chernoff fusion protocol. We prove that under this decentralized fusion protocol, each robot's feature distribution converges to the ground truth distribution in an almost sure sense. We verify this result in numerical simulations that show that the Hellinger distance between the estimated and ground truth feature distributions converges to zero over time for each robot. We also validate our strategy through Software-In-The-Loop (SITL) simulations of quadrotors that search a bounded square grid for a set of visual features distributed on a discretized circle.

ROMay 26, 2021
Collision Recovery Control of a Foldable Quadrotor

Karishma Patnaik, Shatadal Mishra, Zachary Chase et al.

Autonomous missions of small unmanned aerial vehicles (UAVs) are prone to collisions owing to environmental disturbances and localization errors. Consequently, a UAV that can endure collisions and perform recovery control in critical aerial missions is desirable to prevent loss of the vehicle and/or payload. We address this problem by proposing a novel foldable quadrotor system which can sustain collisions and recover safely. The quadrotor is designed with integrated mechanical compliance using a torsional spring such that the impact time is increased and the net impact force on the main body is decreased. The post-collision dynamics is analysed and a recovery controller is proposed which stabilizes the system to a hovering location without additional collisions. Flight test results on the proposed and a conventional quadrotor demonstrate that for the former, integrated spring-damper characteristics reduce the rebound velocity and lead to simple recovery control algorithms in the event of unintended collisions as compared to a rigid quadrotor of the same dimension.

ROOct 24, 2020
Design and Control of SQUEEZE: A Spring-augmented QUadrotor for intEractions with the Environment to squeeZE-and-fly

Karishma Patnaik, Shatadal Mishra, Seyed Mostafa Rezayat Sorkhabadi et al.

This paper presents the design and control of a novel quadrotor with a variable geometry to physically interact with cluttered environments and fly through narrow gaps and passageways. This compliant quadrotor with passive morphing capabilities is designed using torsional springs at every arm hinge to allow for rotation driven by external forces. We derive the dynamic model of this variable geometry quadrotor (SQUEEZE), and develop an adaptive controller for trajectory tracking. The corresponding Lyapunov stability proof of attitude tracking is also presented. Further, an admittance controller is designed to account for changes in yaw due to physical interactions with the environment. Finally, the proposed design is validated in flight tests with two setups: a small gap and a passageway. The experimental results demonstrate the unique capability of the SQUEEZE in navigating through constrained narrow spaces.