Siddhartha S. Srinivasa

RO
h-index10
43papers
3,230citations
Novelty54%
AI Score36

43 Papers

ROMay 5, 2021Code
Benchmarking Structured Policies and Policy Optimization for Real-World Dexterous Object Manipulation

Niklas Funk, Charles Schaff, Rishabh Madan et al.

Dexterous manipulation is a challenging and important problem in robotics. While data-driven methods are a promising approach, current benchmarks require simulation or extensive engineering support due to the sample inefficiency of popular methods. We present benchmarks for the TriFinger system, an open-source robotic platform for dexterous manipulation and the focus of the 2020 Real Robot Challenge. The benchmarked methods, which were successful in the challenge, can be generally described as structured policies, as they combine elements of classical robotics and modern policy optimization. This inclusion of inductive biases facilitates sample efficiency, interpretability, reliability and high performance. The key aspects of this benchmarking is validation of the baselines across both simulation and the real system, thorough ablation study over the core features of each solution, and a retrospective analysis of the challenge as a manipulation benchmark. The code and demo videos for this work can be found on our website (https://sites.google.com/view/benchmark-rrc).

ROAug 21, 2019Code
MuSHR: A Low-Cost, Open-Source Robotic Racecar for Education and Research

Siddhartha S. Srinivasa, Patrick Lancaster, Johan Michalove et al.

We present MuSHR, the Multi-agent System for non-Holonomic Racing. MuSHR is a low-cost, open-source robotic racecar platform for education and research, developed by the Personal Robotics Lab in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. MuSHR aspires to contribute towards democratizing the field of robotics as a low-cost platform that can be built and deployed by following detailed, open documentation and do-it-yourself tutorials. A set of demos and lab assignments developed for the Mobile Robots course at the University of Washington provide guided, hands-on experience with the platform, and milestones for further development. MuSHR is a valuable asset for academic research labs, robotics instructors, and robotics enthusiasts.

LGFeb 4, 2025
Generative Data Mining with Longtail-Guided Diffusion

David S. Hayden, Mao Ye, Timur Garipov et al.

It is difficult to anticipate the myriad challenges that a predictive model will encounter once deployed. Common practice entails a reactive, cyclical approach: model deployment, data mining, and retraining. We instead develop a proactive longtail discovery process by imagining additional data during training. In particular, we develop general model-based longtail signals, including a differentiable, single forward pass formulation of epistemic uncertainty that does not impact model parameters or predictive performance but can flag rare or hard inputs. We leverage these signals as guidance to generate additional training data from a latent diffusion model in a process we call Longtail Guidance (LTG). Crucially, we can perform LTG without retraining the diffusion model or the predictive model, and we do not need to expose the predictive model to intermediate diffusion states. Data generated by LTG exhibit semantically meaningful variation, yield significant generalization improvements on numerous image classification benchmarks, and can be analyzed by a VLM to proactively discover, textually explain, and address conceptual gaps in a deployed predictive model.

ROFeb 14, 2022
Benchmarking Robot Manipulation with the Rubik's Cube

Boling Yang, Patrick E. Lancaster, Siddhartha S. Srinivasa et al.

Benchmarks for robot manipulation are crucial to measuring progress in the field, yet there are few benchmarks that demonstrate critical manipulation skills, possess standardized metrics, and can be attempted by a wide array of robot platforms. To address a lack of such benchmarks, we propose Rubik's cube manipulation as a benchmark to measure simultaneous performance of precise manipulation and sequential manipulation. The sub-structure of the Rubik's cube demands precise positioning of the robot's end effectors, while its highly reconfigurable nature enables tasks that require the robot to manage pose uncertainty throughout long sequences of actions. We present a protocol for quantitatively measuring both the accuracy and speed of Rubik's cube manipulation. This protocol can be attempted by any general-purpose manipulator, and only requires a standard 3x3 Rubik's cube and a flat surface upon which the Rubik's cube initially rests (e.g. a table). We demonstrate this protocol for two distinct baseline approaches on a PR2 robot. The first baseline provides a fundamental approach for pose-based Rubik's cube manipulation. The second baseline demonstrates the benchmark's ability to quantify improved performance by the system, particularly that resulting from the integration of pre-touch sensing. To demonstrate the benchmark's applicability to other robot platforms and algorithmic approaches, we present the functional blocks required to enable the HERB robot to manipulate the Rubik's cube via push-grasping.

RONov 4, 2021
Stein Variational Probabilistic Roadmaps

Alexander Lambert, Brian Hou, Rosario Scalise et al.

Efficient and reliable generation of global path plans are necessary for safe execution and deployment of autonomous systems. In order to generate planning graphs which adequately resolve the topology of a given environment, many sampling-based motion planners resort to coarse, heuristically-driven strategies which often fail to generalize to new and varied surroundings. Further, many of these approaches are not designed to contend with partial-observability. We posit that such uncertainty in environment geometry can, in fact, help drive the sampling process in generating feasible, and probabilistically-safe planning graphs. We propose a method for Probabilistic Roadmaps which relies on particle-based Variational Inference to efficiently cover the posterior distribution over feasible regions in configuration space. Our approach, Stein Variational Probabilistic Roadmap (SV-PRM), results in sample-efficient generation of planning-graphs and large improvements over traditional sampling approaches. We demonstrate the approach on a variety of challenging planning problems, including real-world probabilistic occupancy maps and high-dof manipulation problems common in robotics.

ROSep 22, 2021
Real Robot Challenge: A Robotics Competition in the Cloud

Stefan Bauer, Felix Widmaier, Manuel Wüthrich et al.

Dexterous manipulation remains an open problem in robotics. To coordinate efforts of the research community towards tackling this problem, we propose a shared benchmark. We designed and built robotic platforms that are hosted at MPI for Intelligent Systems and can be accessed remotely. Each platform consists of three robotic fingers that are capable of dexterous object manipulation. Users are able to control the platforms remotely by submitting code that is executed automatically, akin to a computational cluster. Using this setup, i) we host robotics competitions, where teams from anywhere in the world access our platforms to tackle challenging tasks ii) we publish the datasets collected during these competitions (consisting of hundreds of robot hours), and iii) we give researchers access to these platforms for their own projects.

ROSep 15, 2021
Analyzing Multiagent Interactions in Traffic Scenes via Topological Braids

Christoforos Mavrogiannis, Jonathan DeCastro, Siddhartha S. Srinivasa

We focus on the problem of analyzing multiagent interactions in traffic domains. Understanding the space of behavior of real-world traffic may offer significant advantages for algorithmic design, data-driven methodologies, and benchmarking. However, the high dimensionality of the space and the stochasticity of human behavior may hinder the identification of important interaction patterns. Our key insight is that traffic environments feature significant geometric and temporal structure, leading to highly organized collective behaviors, often drawn from a small set of dominant modes. In this work, we propose a representation based on the formalism of topological braids that can summarize arbitrarily complex multiagent behavior into a compact object of dual geometric and symbolic nature, capturing critical events of interaction. This representation allows us to formally enumerate the space of outcomes in a traffic scene and characterize their complexity. We illustrate the value of the proposed representation in summarizing critical aspects of real-world traffic behavior through a case study on recent driving datasets. We show that despite the density of real-world traffic, observed behavior tends to follow highly organized patterns of low interaction. Our framework may be a valuable tool for evaluating the richness of driving datasets, but also for synthetically designing balanced training datasets or benchmarks.

ROSep 10, 2021
Winding Through: Crowd Navigation via Topological Invariance

Christoforos Mavrogiannis, Krishna Balasubramanian, Sriyash Poddar et al.

We focus on robot navigation in crowded environments. The challenge of predicting the motion of a crowd around a robot makes it hard to ensure human safety and comfort. Recent approaches often employ end-to-end techniques for robot control or deep architectures for high-fidelity human motion prediction. While these methods achieve important performance benchmarks in simulated domains, dataset limitations and high sample complexity tend to prevent them from transferring to real-world environments. Our key insight is that a low-dimensional representation that captures critical features of crowd-robot dynamics could be sufficient to enable a robot to wind through a crowd smoothly. To this end, we mathematically formalize the act of passing between two agents as a rotation, using a notion of topological invariance. Based on this formalism, we design a cost functional that favors robot trajectories contributing higher passing progress and penalizes switching between different sides of a human. We incorporate this functional into a model predictive controller that employs a simple constant-velocity model of human motion prediction. This results in robot motion that accomplishes statistically significantly higher clearances from the crowd compared to state-of-the-art baselines while maintaining competitive levels of efficiency, across extensive simulations and challenging real-world experiments on a self-balancing robot.

ROAug 3, 2021
Desk Organization: Effect of Multimodal Inputs on Spatial Relational Learning

Ryan Rowe, Shivam Singhal, Daqing Yi et al.

For robots to operate in a three dimensional world and interact with humans, learning spatial relationships among objects in the surrounding is necessary. Reasoning about the state of the world requires inputs from many different sensory modalities including vision ($V$) and haptics ($H$). We examine the problem of desk organization: learning how humans spatially position different objects on a planar surface according to organizational ''preference''. We model this problem by examining how humans position objects given multiple features received from vision and haptic modalities. However, organizational habits vary greatly between people both in structure and adherence. To deal with user organizational preferences, we add an additional modality, ''utility'' ($U$), which informs on a particular human's perceived usefulness of a given object. Models were trained as generalized (over many different people) or tailored (per person). We use two types of models: random forests, which focus on precise multi-task classification, and Markov logic networks, which provide an easily interpretable insight into organizational habits. The models were applied to both synthetic data, which proved to be learnable when using fixed organizational constraints, and human-study data, on which the random forest achieved over 90% accuracy. Over all combinations of $\{H, U, V\}$ modalities, $UV$ and $HUV$ were the most informative for organization. In a follow-up study, we gauged participants preference of desk organizations by a generalized random forest organization vs. by a random model. On average, participants rated the random forest models as 4.15 on a 5-point Likert scale compared to 1.84 for the random model

ROApr 11, 2021
Guided Incremental Local Densification for Accelerated Sampling-based Motion Planning

Aditya Mandalika, Rosario Scalise, Brian Hou et al.

Sampling-based motion planners rely on incremental densification to discover progressively shorter paths. After computing feasible path $ξ$ between start $x_s$ and goal $x_t$, the Informed Set (IS) prunes the configuration space $\mathcal{C}$ by conservatively eliminating points that cannot yield shorter paths. Densification via sampling from this Informed Set retains asymptotic optimality of sampling from the entire configuration space. For path length $c(ξ)$ and Euclidean heuristic $h$, $IS = \{ x | x \in \mathcal{C}, h(x_s, x) + h(x, x_t) \leq c(ξ) \}$. Relying on the heuristic can render the IS especially conservative in high dimensions or complex environments. Furthermore, the IS only shrinks when shorter paths are discovered. Thus, the computational effort from each iteration of densification and planning is wasted if it fails to yield a shorter path, despite improving the cost-to-come for vertices in the search tree. Our key insight is that even in such a failure, shorter paths to vertices in the search tree (rather than just the goal) can immediately improve the planner's sampling strategy. Guided Incremental Local Densification (GuILD) leverages this information to sample from Local Subsets of the IS. We show that GuILD significantly outperforms uniform sampling of the Informed Set in simulated $\mathbb{R}^2$, $SE(2)$ environments and manipulation tasks in $\mathbb{R}^7$.

ROApr 2, 2021
Learning Online from Corrective Feedback: A Meta-Algorithm for Robotics

Matthew Schmittle, Sanjiban Choudhury, Siddhartha S. Srinivasa

A key challenge in Imitation Learning (IL) is that optimal state actions demonstrations are difficult for the teacher to provide. For example in robotics, providing kinesthetic demonstrations on a robotic manipulator requires the teacher to control multiple degrees of freedom at once. The difficulty of requiring optimal state action demonstrations limits the space of problems where the teacher can provide quality feedback. As an alternative to state action demonstrations, the teacher can provide corrective feedback such as their preferences or rewards. Prior work has created algorithms designed to learn from specific types of noisy feedback, but across teachers and tasks different forms of feedback may be required. Instead we propose that in order to learn from a diversity of scenarios we need to learn from a variety of feedback. To learn from a variety of feedback we make the following insight: the teacher's cost function is latent and we can model a stream of feedback as a stream of loss functions. We then use any online learning algorithm to minimize the sum of these losses. With this insight we can learn from a diversity of feedback that is weakly correlated with the teacher's true cost function. We unify prior work into a general corrective feedback meta-algorithm and show that regardless of feedback we can obtain the same regret bounds. We demonstrate our approach by learning to perform a household navigation task on a robotic racecar platform. Our results show that our approach can learn quickly from a variety of noisy feedback.

RONov 8, 2020
Multimodal Trajectory Prediction via Topological Invariance for Navigation at Uncontrolled Intersections

Junha Roh, Christoforos Mavrogiannis, Rishabh Madan et al.

We focus on decentralized navigation among multiple non-communicating rational agents at \emph{uncontrolled} intersections, i.e., street intersections without traffic signs or signals. Avoiding collisions in such domains relies on the ability of agents to predict each others' intentions reliably, and react quickly. Multiagent trajectory prediction is NP-hard whereas the sample complexity of existing data-driven approaches limits their applicability. Our key insight is that the geometric structure of the intersection and the incentive of agents to move efficiently and avoid collisions (rationality) reduces the space of likely behaviors, effectively relaxing the problem of trajectory prediction. In this paper, we collapse the space of multiagent trajectories at an intersection into a set of modes representing different classes of multiagent behavior, formalized using a notion of topological invariance. Based on this formalism, we design Multiple Topologies Prediction (MTP), a data-driven trajectory-prediction mechanism that reconstructs trajectory representations of high-likelihood modes in multiagent intersection scenes. We show that MTP outperforms a state-of-the-art multimodal trajectory prediction baseline (MFP) in terms of prediction accuracy by 78.24% on a challenging simulated dataset. Finally, we show that MTP enables our optimization-based planner, MTPnav, to achieve collision-free and time-efficient navigation across a variety of challenging intersection scenarios on the CARLA simulator.

RONov 5, 2020
Leveraging Post Hoc Context for Faster Learning in Bandit Settings with Applications in Robot-Assisted Feeding

Ethan K. Gordon, Sumegh Roychowdhury, Tapomayukh Bhattacharjee et al.

Autonomous robot-assisted feeding requires the ability to acquire a wide variety of food items. However, it is impossible for such a system to be trained on all types of food in existence. Therefore, a key challenge is choosing a manipulation strategy for a previously unseen food item. Previous work showed that the problem can be represented as a linear bandit with visual context. However, food has a wide variety of multi-modal properties relevant to manipulation that can be hard to distinguish visually. Our key insight is that we can leverage the haptic context we collect during and after manipulation (i.e., "post hoc") to learn some of these properties and more quickly adapt our visual model to previously unseen food. In general, we propose a modified linear contextual bandit framework augmented with post hoc context observed after action selection to empirically increase learning speed and reduce cumulative regret. Experiments on synthetic data demonstrate that this effect is more pronounced when the dimensionality of the context is large relative to the post hoc context or when the post hoc context model is particularly easy to learn. Finally, we apply this framework to the bite acquisition problem and demonstrate the acquisition of 8 previously unseen types of food with 21% fewer failures across 64 attempts.

ROJul 31, 2020
Telemanipulation with Chopsticks: Analyzing Human Factors in User Demonstrations

Liyiming Ke, Ajinkya Kamat, Jingqiang Wang et al.

Chopsticks constitute a simple yet versatile tool that humans have used for thousands of years to perform a variety of challenging tasks ranging from food manipulation to surgery. Applying such a simple tool in a diverse repertoire of scenarios requires significant adaptability. Towards developing autonomous manipulators with comparable adaptability to humans, we study chopsticks-based manipulation to gain insights into human manipulation strategies. We conduct a within-subjects user study with 25 participants, evaluating three different data-collection methods: normal chopsticks, motion-captured chopsticks, and a novel chopstick telemanipulation interface. We analyze factors governing human performance across a variety of challenging chopstick-based grasping tasks. Although participants rated teleoperation as the least comfortable and most difficult-to-use method, teleoperation enabled users to achieve the highest success rates on three out of five objects considered. Further, we notice that subjects quickly learned and adapted to the teleoperation interface. Finally, while motion-captured chopsticks could provide a better reflection of how humans use chopsticks, the teleoperation interface can produce quality on-hardware demonstrations from which the robot can directly learn.

ROApr 10, 2020
Implicit Multiagent Coordination at Unsignalized Intersections via Multimodal Inference Enabled by Topological Braids

Christoforos Mavrogiannis, Jonathan A. DeCastro, Siddhartha S. Srinivasa

We focus on navigation among rational, non-communicating agents at unsignalized street intersections. Following collision-free motion under such settings demands nuanced implicit coordination among agents. Often, the structure of these domains constrains multiagent trajectories to belong to a finite set of modes. Our key insight is that empowering agents with a model of these modes can enable effective coordination, realized implicitly via intent signals encoded in agents' actions. In this paper, we represent modes of joint behavior in a compact and interpretable fashion using the formalism of topological braids. We design a decentralized planning algorithm that generates actions aimed at reducing the uncertainty over the mode of the emerging multiagent behavior. This mechanism enables agents that individually run our algorithm to collectively reject unsafe intersection crossings. We validate our approach in a simulated case study featuring challenging multiagent scenarios at a four-way unsignalized intersection. Our model is shown to reduce frequency of collisions by >65% over a set of baselines explicitly reasoning over trajectories, while maintaining comparable time efficiency.

ROFeb 27, 2020
Posterior Sampling for Anytime Motion Planning on Graphs with Expensive-to-Evaluate Edges

Brian Hou, Sanjiban Choudhury, Gilwoo Lee et al.

Collision checking is a computational bottleneck in motion planning, requiring lazy algorithms that explicitly reason about when to perform this computation. Optimism in the face of collision uncertainty minimizes the number of checks before finding the shortest path. However, this may take a prohibitively long time to compute, with no other feasible paths discovered during this period. For many real-time applications, we instead demand strong anytime performance, defined as minimizing the cumulative lengths of the feasible paths yielded over time. We introduce Posterior Sampling for Motion Planning (PSMP), an anytime lazy motion planning algorithm that leverages learned posteriors on edge collisions to quickly discover an initial feasible path and progressively yield shorter paths. PSMP obtains an expected regret bound of $\tilde{O}(\sqrt{\mathcal{S} \mathcal{A} T})$ and outperforms comparative baselines on a set of 2D and 7D planning problems.

ROFeb 7, 2020
Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts

Gilwoo Lee, Brian Hou, Sanjiban Choudhury et al.

Informed and robust decision making in the face of uncertainty is critical for robots that perform physical tasks alongside people. We formulate this as Bayesian Reinforcement Learning over latent Markov Decision Processes (MDPs). While Bayes-optimality is theoretically the gold standard, existing algorithms do not scale well to continuous state and action spaces. Our proposal builds on the following insight: in the absence of uncertainty, each latent MDP is easier to solve. We first obtain an ensemble of experts, one for each latent MDP, and fuse their advice to compute a baseline policy. Next, we train a Bayesian residual policy to improve upon the ensemble's recommendation and learn to reduce uncertainty. Our algorithm, Bayesian Residual Policy Optimization (BRPO), imports the scalability of policy gradient methods and task-specific expert skills. BRPO significantly improves the ensemble of experts and drastically outperforms existing adaptive RL methods.

ROSep 14, 2019
Towards Effective Human-AI Teams: The Case of Collaborative Packing

Gilwoo Lee, Christoforos Mavrogiannis, Siddhartha S. Srinivasa

We focus on the problem of designing an artificial agent (AI), capable of assisting a human user to complete a task. Our goal is to guide human users towards optimal task performance while keeping their cognitive load as low as possible. Our insight is that doing so requires an understanding of human decision making for the task domain at hand. In this work, we consider the domain of collaborative packing, in which an AI agent provides placement recommendations to a human user. As a first step, we explore the mechanisms underlying human packing strategies. We conducted a user study in which 100 human participants completed a series of packing tasks in a virtual environment. We analyzed their packing strategies and discovered spatial and temporal patterns, such as that humans tend to place larger items at corners first. We expect that imbuing an artificial agent with an understanding of this spatiotemporal structure will enable improved assistance, which will be reflected in the task performance and the human perception of the AI. Ongoing work involves the development of a framework that incorporates the extracted insights to predict and manipulate human decision making towards an efficient trajectory of low cognitive load and high efficiency. A follow-up study will evaluate our framework against a set of baselines featuring alternative strategies of assistance. Our eventual goal is the deployment and evaluation of our framework on an autonomous robotic manipulator, actively assisting users on a packing task.

ROAug 19, 2019
Adaptive Robot-Assisted Feeding: An Online Learning Framework for Acquiring Previously Unseen Food Items

Ethan K. Gordon, Xiang Meng, Matt Barnes et al.

A successful robot-assisted feeding system requires bite acquisition of a wide variety of food items. It must adapt to changing user food preferences under uncertain visual and physical environments. Different food items in different environmental conditions require different manipulation strategies for successful bite acquisition. Therefore, a key challenge is how to handle previously unseen food items with very different success rate distributions over strategy. Combining low-level controllers and planners into discrete action trajectories, we show that the problem can be represented using a linear contextual bandit setting. We construct a simulated environment using a doubly robust loss estimate from previously seen food items, which we use to tune the parameters of off-the-shelf contextual bandit algorithms. Finally, we demonstrate empirically on a robot-assisted feeding system that, even starting with a model trained on thousands of skewering attempts on dissimilar previously seen food items, $ε$-greedy and LinUCB algorithms can quickly converge to the most successful manipulation strategy.

ROJul 22, 2019
LEGO: Leveraging Experience in Roadmap Generation for Sampling-Based Planning

Rahul Kumar, Aditya Mandalika, Sanjiban Choudhury et al.

We consider the problem of leveraging prior experience to generate roadmaps in sampling-based motion planning. A desirable roadmap is one that is sparse, allowing for fast search, with nodes spread out at key locations such that a low-cost feasible path exists. An increasingly popular approach is to learn a distribution of nodes that would produce such a roadmap. State-of-the-art is to train a conditional variational auto-encoder (CVAE) on the prior dataset with the shortest paths as target input. While this is quite effective on many problems, we show it can fail in the face of complex obstacle configurations or mismatch between training and testing. We present an algorithm LEGO that addresses these issues by training the CVAE with target samples that satisfy two important criteria. Firstly, these samples belong only to bottleneck regions along near-optimal paths that are otherwise difficult-to-sample with a uniform sampler. Secondly, these samples are spread out across diverse regions to maximize the likelihood of a feasible path existing. We formally define these properties and prove performance guarantees for LEGO. We extensively evaluate LEGO on a range of planning problems, including robot arm planning, and report significant gains over heuristics as well as learned baselines.

ROJun 5, 2019
Robot-Assisted Feeding: Generalizing Skewering Strategies across Food Items on a Realistic Plate

Ryan Feng, Youngsun Kim, Gilwoo Lee et al.

A robot-assisted feeding system must successfully acquire many different food items. A key challenge is the wide variation in the physical properties of food, demanding diverse acquisition strategies that are also capable of adapting to previously unseen items. Our key insight is that items with similar physical properties will exhibit similar success rates across an action space, allowing the robot to generalize its actions to previously unseen items. To better understand which skewering strategy works best for each food item, we collected a dataset of 2450 robot bite acquisition trials for 16 food items with varying properties. Analyzing the dataset provided insights into how the food items' surrounding environment, fork pitch, and fork roll angles affect bite acquisition success. We then developed a bite acquisition framework that takes the image of a full plate as an input, segments it into food items, and then applies our Skewering-Position-Action network (SPANet) to choose a target food item and a corresponding action so that the bite acquisition success rate is maximized. SPANet also uses the surrounding environment features of food items to predict action success rates. We used this framework to perform multiple experiments on uncluttered and cluttered plates. Results indicate that our integrated system can successfully generalize skewering strategies to many previously unseen food items.

LGOct 6, 2018
Bayes-CPACE: PAC Optimal Exploration in Continuous Space Bayes-Adaptive Markov Decision Processes

Gilwoo Lee, Sanjiban Choudhury, Brian Hou et al.

We present the first PAC optimal algorithm for Bayes-Adaptive Markov Decision Processes (BAMDPs) in continuous state and action spaces, to the best of our knowledge. The BAMDP framework elegantly addresses model uncertainty by incorporating Bayesian belief updates into long-term expected return. However, computing an exact optimal Bayesian policy is intractable. Our key insight is to compute a near-optimal value function by covering the continuous state-belief-action space with a finite set of representative samples and exploiting the Lipschitz continuity of the value function. We prove the near-optimality of our algorithm and analyze a number of schemes that boost the algorithm's efficiency. Finally, we empirically validate our approach on a number of discrete and continuous BAMDPs and show that the learned policy has consistently competitive performance against baseline approaches.

ROOct 1, 2018
Bayesian Policy Optimization for Model Uncertainty

Gilwoo Lee, Brian Hou, Aditya Mandalika et al.

Addressing uncertainty is critical for autonomous systems to robustly adapt to the real world. We formulate the problem of model uncertainty as a continuous Bayes-Adaptive Markov Decision Process (BAMDP), where an agent maintains a posterior distribution over latent model parameters given a history of observations and maximizes its expected long-term reward with respect to this belief distribution. Our algorithm, Bayesian Policy Optimization, builds on recent policy optimization algorithms to learn a universal policy that navigates the exploration-exploitation trade-off to maximize the Bayesian value function. To address challenges from discretizing the continuous latent parameter space, we propose a new policy network architecture that encodes the belief distribution independently from the observable state. Our method significantly outperforms algorithms that address model uncertainty without explicitly reasoning about belief distributions and is competitive with state-of-the-art Partially Observable Markov Decision Process solvers.

ROSep 30, 2018
Improved Proximity, Contact, and Force Sensing via Optimization of Elastomer-Air Interface Geometry

Patrick E. Lancaster, Joshua R. Smith, Siddhartha S. Srinivasa

We describe a single fingertip-mounted sensing system for robot manipulation that provides proximity (pre-touch), contact detection (touch), and force sensing (post-touch). The sensor system consists of optical time-of-flight range measurement modules covered in a clear elastomer. Because the elastomer is clear, the sensor can detect and range nearby objects, as well as measure deformations caused by objects that are in contact with the sensor and thereby estimate the applied force. We examine how this sensor design can be improved with respect to invariance to object reflectivity, signal-to-noise ratio, and continuous operation when switching between the distance and force measurement regimes. By harnessing time-of-flight technology and optimizing the elastomer-air boundary to control the emitted light's path, we develop a sensor that is able to seamlessly transition between measuring distances of up to 50mm and contact forces of up to 10 newtons. Furthermore, we provide all hardware design files and software sources, and offer thorough instructions on how to manufacture the sensor from inexpensive, commercially available components.

ROApr 23, 2018
Towards Robotic Feeding: Role of Haptics in Fork-based Food Manipulation

Tapomayukh Bhattacharjee, Gilwoo Lee, Hanjun Song et al.

Autonomous feeding is challenging because it requires manipulation of food items with various compliance, sizes, and shapes. To understand how humans manipulate food items during feeding and to explore ways to adapt their strategies to robots, we collected a rich dataset of human trajectories by asking them to pick up food and feed it to a mannequin. From the analysis of the collected haptic and motion signals, we demonstrate that humans adapt their control policies to accommodate to the compliance and shape of the food item being acquired. We propose a taxonomy of manipulation strategies for feeding to highlight such policies. As a first step to generate compliance-dependent policies, we propose a set of classifiers for compliance-based food categorization from haptic and motion signals. We compare these human manipulation strategies with fixed position-control policies via a robot. Our analysis of success and failure cases of human and robot policies further highlights the importance of adapting the policy to the compliance of a food item.

RONov 10, 2017
Anytime Motion Planning on Large Dense Roadmaps with Expensive Edge Evaluations

Shushman Choudhury, Oren Salzman, Sanjiban Choudhury et al.

We propose an algorithmic framework for efficient anytime motion planning on large dense geometric roadmaps, in domains where collision checks and therefore edge evaluations are computationally expensive. A large dense roadmap (graph) can typically ensure the existence of high quality solutions for most motion-planning problems, but the size of the roadmap, particularly in high-dimensional spaces, makes existing search-based planning algorithms computationally expensive. We deal with the challenges of expensive search and collision checking in two ways. First, we frame the problem of anytime motion planning on roadmaps as searching for the shortest path over a sequence of subgraphs of the entire roadmap graph, generated by some densification strategy. This lets us achieve bounded sub-optimality with bounded worst-case planning effort. Second, for searching each subgraph, we develop an anytime planning algorithm which uses a belief model to compute the collision probability of unknown configurations and searches for paths that are Pareto-optimal in path length and collision probability. This algorithm is efficient with respect to collision checks as it searches for successively shorter paths. We theoretically analyze both our ideas and evaluate them individually on high-dimensional motion-planning problems. Finally, we apply both of these ideas together in our algorithmic framework for anytime motion planning, and show that it outperforms BIT* on high-dimensional hypercube problems.

ROOct 14, 2017
Hybrid DDP in Clutter (CHDDP): Trajectory Optimization for Hybrid Dynamical System in Cluttered Environments

Shushman Choudhury, Yifan Hou, Gilwoo Lee et al.

We present an algorithm for obtaining an optimal control policy for hybrid dynamical systems in cluttered environments. To the best of our knowledge, this is the first attempt to have a locally optimal solution for this specific problem setting. Our approach extends an optimal control algorithm for hybrid dynamical systems in the obstacle-free case to environments with obstacles. Our method does not require any preset mode sequence or heuristics to prune the exponential search of mode sequences. By first solving the relaxed problem of getting an obstacle-free, dynamically feasible trajectory and then solving for both obstacle-avoidance and optimality, we can generate smooth, locally optimal control policies. We demonstrate the performance of our algorithm on a box-pushing example in a number of environments against the baseline of randomly sampling modes and actions with a Kinodynamic RRT.

ROOct 11, 2017
The Provable Virtue of Laziness in Motion Planning

Nika Haghtalab, Simon Mackenzie, Ariel D. Procaccia et al.

The Lazy Shortest Path (LazySP) class consists of motion-planning algorithms that only evaluate edges along shortest paths between the source and target. These algorithms were designed to minimize the number of edge evaluations in settings where edge evaluation dominates the running time of the algorithm; but how close to optimal are LazySP algorithms in terms of this objective? Our main result is an analytical upper bound, in a probabilistic model, on the number of edge evaluations required by LazySP algorithms; a matching lower bound shows that these algorithms are asymptotically optimal in the worst case.

ROOct 2, 2017
Unsupervised Learning for Nonlinear PieceWise Smooth Hybrid Systems

Gilwoo Lee, Zita Marinho, Aaron M. Johnson et al.

This paper introduces a novel system identification and tracking method for PieceWise Smooth (PWS) nonlinear stochastic hybrid systems. We are able to correctly identify and track challenging problems with diverse dynamics and low dimensional transitions. We exploit the composite structure system to learn a simpler model on each component/mode. We use Gaussian Process Regression techniques to learn smooth, nonlinear manifolds across mode transitions, guard-regions, and make multi-step ahead predictions on each mode dynamics. We combine a PWS non-linear model with a particle filter to effectively track multi-modal transitions. We further use synthetic oversampling techniques to address the challenge of detecting mode transition which is sparse compared to mode dynamics. This work provides an effective form of model learning in a complex hybrid system, which can be useful for future integration in a reinforcement learning setting. We compare multi-step prediction and tracking performance against traditional dynamical system tracking methods, such as EKF and Switching Gaussian Processes, and show that this framework performs significantly better, being able to correctly track complex dynamics with sparse transitions.

ROJul 6, 2017
Batch Informed Trees (BIT*): Informed Asymptotically Optimal Anytime Search

Jonathan D. Gammell, Timothy D. Barfoot, Siddhartha S. Srinivasa

Path planning in robotics often requires finding high-quality solutions to continuously valued and/or high-dimensional problems. These problems are challenging and most planning algorithms instead solve simplified approximations. Popular approximations include graphs and random samples, as respectively used by informed graph-based searches and anytime sampling-based planners. Informed graph-based searches, such as A*, traditionally use heuristics to search a priori graphs in order of potential solution quality. This makes their search efficient but leaves their performance dependent on the chosen approximation. If its resolution is too low then they may not find a (suitable) solution but if it is too high then they may take a prohibitively long time to do so. Anytime sampling-based planners, such as RRT*, traditionally use random sampling to approximate the problem domain incrementally. This allows them to increase resolution until a suitable solution is found but makes their search dependent on the order of approximation. Arbitrary sequences of random samples approximate the problem domain in every direction simultaneously and but may be prohibitively inefficient at containing a solution. This paper unifies and extends these two approaches to develop Batch Informed Trees (BIT*), an informed, anytime sampling-based planner. BIT* solves continuous path planning problems efficiently by using sampling and heuristics to alternately approximate and search the problem domain. Its search is ordered by potential solution quality, as in A*, and its approximation improves indefinitely with additional computational time, as in RRT*. It is shown analytically to be almost-surely asymptotically optimal and experimentally to outperform existing sampling-based planners, especially on high-dimensional planning problems.

ROJun 1, 2017
Shared Autonomy via Hindsight Optimization for Teleoperation and Teaming

Shervin Javdani, Henny Admoni, Stefania Pellegrinelli et al.

In shared autonomy, a user and autonomous system work together to achieve shared goals. To collaborate effectively, the autonomous system must know the user's goal. As such, most prior works follow a predict-then-act model, first predicting the user's goal with high confidence, then assisting given that goal. Unfortunately, confidently predicting the user's goal may not be possible until they have nearly achieved it, causing predict-then-act methods to provide little assistance. However, the system can often provide useful assistance even when confidence for any single goal is low (e.g. move towards multiple goals). In this work, we formalize this insight by modelling shared autonomy as a Partially Observable Markov Decision Process (POMDP), providing assistance that minimizes the expected cost-to-go with an unknown goal. As solving this POMDP optimally is intractable, we use hindsight optimization to approximate. We apply our framework to both shared-control teleoperation and human-robot teaming. Compared to predict-then-act methods, our method achieves goals faster, requires less user input, decreases user idling time, and results in fewer user-robot collisions.

ROMay 15, 2017
GP-ILQG: Data-driven Robust Optimal Control for Uncertain Nonlinear Dynamical Systems

Gilwoo Lee, Siddhartha S. Srinivasa, Matthew T. Mason

As we aim to control complex systems, use of a simulator in model-based reinforcement learning is becoming more common. However, it has been challenging to overcome the Reality Gap, which comes from nonlinear model bias and susceptibility to disturbance. To address these problems, we propose a novel algorithm that combines data-driven system identification approach (Gaussian Process) with a Differential-Dynamic-Programming-based robust optimal control method (Iterative Linear Quadratic Control). Our algorithm uses the simulator's model as the mean function for a Gaussian Process and learns only the difference between the simulator's prediction and actual observations, making it a natural hybrid of simulation and real-world observation. We show that our approach quickly corrects incorrect models, comes up with robust optimal controllers, and transfers its acquired model knowledge to new tasks efficiently.

ROMar 25, 2017
A New Paradigm for Robotic Dust Collection: Theorems, User Studies, and a Field Study

Rachel M. Holladay, Siddhartha S. Srinivasa

We pioneer a new future in robotic dust collection by introducing passive dust-collecting robots that, unlike their predecessors, do not require locomotion to collect dust. While previous research has exclusively focused on active dust-collecting robots, we show that these robots fail with respect to practical and theoretical aspects, as well as human factors. By contrast, passive robots, through their unconstrained versatility, shine brilliantly in all three metrics. We present a mathematical formalism of both paradigms followed by a user study and field study.

RONov 1, 2016
Densification Strategies for Anytime Motion Planning over Large Dense Roadmaps

Shushman Choudhury, Oren Salzman, Sanjiban Choudhury et al.

We consider the problem of computing shortest paths in a dense motion-planning roadmap $\mathcal{G}$. We assume that~$n$, the number of vertices of $\mathcal{G}$, is very large. Thus, using any path-planning algorithm that directly searches $\mathcal{G}$, running in $O(V\textrm{log}V + E) \approx O(n^2)$ time, becomes unacceptably expensive. We are therefore interested in anytime search to obtain successively shorter feasible paths and converge to the shortest path in $\mathcal{G}$. Our key insight is to provide existing path-planning algorithms with a sequence of increasingly dense subgraphs of $\mathcal{G}$. We study the space of all ($r$-disk) subgraphs of $\mathcal{G}$. We then formulate and present two densification strategies for traversing this space which exhibit complementary properties with respect to problem difficulty. This inspires a third, hybrid strategy which has favourable properties regardless of problem difficulty. This general approach is then demonstrated and analyzed using the specific case where a low-dispersion deterministic sequence is used to generate the samples used for $\mathcal{G}$. Finally we empirically evaluate the performance of our strategies for random scenarios in $\mathbb{R}^{2}$ and $\mathbb{R}^{4}$ and on manipulation planning problems for a 7 DOF robot arm, and validate our analysis.

ROSep 9, 2016
A Linear-Time Variational Integrator for Multibody Systems

Jeongseok Lee, C. Karen Liu, Frank C. Park et al.

We present an efficient variational integrator for multibody systems. Variational integrators reformulate the equations of motion for multibody systems as discrete Euler-Lagrange (DEL) equations, transforming forward integration into a root-finding problem for the DEL equations. Variational integrators have been shown to be more robust and accurate in preserving fundamental properties of systems, such as momentum and energy, than many frequently used numerical integrators. However, state-of-the-art algorithms suffer from $O(n^3)$ complexity, which is prohibitive for articulated multibody systems with a large number of degrees of freedom, $n$, in generalized coordinates. Our key contribution is to derive a recursive algorithm that evaluates DEL equations in $O(n)$, which scales up well for complex multibody systems such as humanoid robots. Inspired by recursive Newton-Euler algorithm, our key insight is to formulate DEL equation individually for each body rather than for the entire system. Furthermore, we introduce a new quasi-Newton method that exploits the impulse-based dynamics algorithm, which is also $O(n)$, to avoid the expensive Jacobian inversion in solving DEL equations. We demonstrate scalability and efficiency, as well as extensibility to holonomic constraints through several case studies.

ROApr 30, 2016
Configuration Lattices for Planar Contact Manipulation Under Uncertainty

Michael C. Koval, David Hsu, Nancy S. Pollard et al.

This work addresses the challenge of a robot using real-time feedback from contact sensors to reliably manipulate a movable object on a cluttered tabletop. We formulate contact manipulation as a partially observable Markov decision process (POMDP) in the joint space of robot configurations and object poses. The POMDP formulation enables the robot to actively gather information and reduce the uncertainty on the object pose. Further, it incorporates all major constraints for robot manipulation: kinematic reachability, self-collision, and collision with obstacles. To solve the POMDP, we apply DESPOT, a state-of-the-art online POMDP algorithm. Our approach leverages two key ideas for computational efficiency. First, it performs lazy construction of a configuration-space lattice by interleaving construction of the lattice and online POMDP planning. Second, it combines online and offline POMDP planning by solving relaxed POMDP offline and using the solution to guide the online search algorithm. We evaluated the proposed approach on a seven degree-of-freedom robot arm in simulation environments. It significantly outperforms several existing algorithms, including some commonly used heuristics for contact manipulation under uncertainty.

ROApr 25, 2016
The Manifold Particle Filter for State Estimation on High-dimensional Implicit Manifolds

Matthew Klingensmith, Michael C. Koval, Siddhartha S. Srinivasa et al.

We estimate the state a noisy robot arm and underactuated hand using an Implicit Manifold Particle Filter (MPF) informed by touch sensors. As the robot touches the world, its state space collapses to a contact manifold that we represent implicitly using a signed distance field. This allows us to extend the MPF to higher (six or more) dimensional state spaces. Earlier work (which explicitly represents the contact manifold) only shows the MPF in two or three dimensions. Through a series of experiments, we show that the implicit MPF converges faster and is more accurate than a conventional particle filter during periods of persistent contact. We present three methods of sampling the implicit contact manifold, and compare them in experiments.

ROMar 29, 2016
Rearrangement Planning via Heuristic Search

Jennifer E. King, Siddhartha S. Srinivasa

We present a method to apply heuristic search algorithms to solve rearrangement planning by pushing problems. In these problems, a robot must push an object through clutter to achieve a goal. To do this, we exploit the fact that contact with objects in the environment is critical to goal achievement. We dynamically generate goal-directed primitives that create and maintain contact between robot and object at each state expansion during the search. These primitives focus exploration toward critical areas of state-space, providing tractability to the high-dimensional planning problem. We demonstrate that the use of these primitives, combined with an informative yet simple to compute heuristic, improves success rate when compared to a planner that uses only primitives formed from discretizing the robot's action space. In addition, we show our planner outperforms RRT-based approaches by producing shorter paths faster. We demonstrate our algorithm both in simulation and on a 7-DOF arm pushing objects on a table.

DSMar 10, 2016
A Unifying Formalism for Shortest Path Problems with Expensive Edge Evaluations via Lazy Best-First Search over Paths with Edge Selectors

Christopher M. Dellin, Siddhartha S. Srinivasa

While the shortest path problem has myriad applications, the computational efficiency of suitable algorithms depends intimately on the underlying problem domain. In this paper, we focus on domains where evaluating the edge weight function dominates algorithm running time. Inspired by approaches in robotic motion planning, we define and investigate the Lazy Shortest Path class of algorithms which is differentiated by the choice of an edge selector function. We show that several algorithms in the literature are equivalent to this lazy algorithm for appropriate choice of this selector. Further, we propose various novel selectors inspired by sampling and statistical mechanics, and find that these selectors outperform existing algorithms on a set of example problems.

ROMar 26, 2015
Shared Autonomy via Hindsight Optimization

Shervin Javdani, Siddhartha S. Srinivasa, J. Andrew Bagnell

In shared autonomy, user input and robot autonomy are combined to control a robot to achieve a goal. Often, the robot does not know a priori which goal the user wants to achieve, and must both predict the user's intended goal, and assist in achieving that goal. We formulate the problem of shared autonomy as a Partially Observable Markov Decision Process with uncertainty over the user's goal. We utilize maximum entropy inverse optimal control to estimate a distribution over the user's goal based on the history of inputs. Ideally, the robot assists the user by solving for an action which minimizes the expected cost-to-go for the (unknown) goal. As solving the POMDP to select the optimal action is intractable, we use hindsight optimization to approximate the solution. In a user study, we compare our method to a standard predict-then-blend approach. We find that our method enables users to accomplish tasks more quickly while utilizing less input. However, when asked to rate each system, users were mixed in their assessment, citing a tradeoff between maintaining control authority and accomplishing tasks quickly.

ROMay 22, 2014
Batch Informed Trees (BIT*): Sampling-based Optimal Planning via the Heuristically Guided Search of Implicit Random Geometric Graphs

Jonathan D. Gammell, Siddhartha S. Srinivasa, Timothy D. Barfoot

In this paper, we present Batch Informed Trees (BIT*), a planning algorithm based on unifying graph- and sampling-based planning techniques. By recognizing that a set of samples describes an implicit random geometric graph (RGG), we are able to combine the efficient ordered nature of graph-based techniques, such as A*, with the anytime scalability of sampling-based algorithms, such as Rapidly-exploring Random Trees (RRT). BIT* uses a heuristic to efficiently search a series of increasingly dense implicit RGGs while reusing previous information. It can be viewed as an extension of incremental graph-search techniques, such as Lifelong Planning A* (LPA*), to continuous problem domains as well as a generalization of existing sampling-based optimal planners. It is shown that it is probabilistically complete and asymptotically optimal. We demonstrate the utility of BIT* on simulated random worlds in $\mathbb{R}^2$ and $\mathbb{R}^8$ and manipulation problems on CMU's HERB, a 14-DOF two-armed robot. On these problems, BIT* finds better solutions faster than RRT, RRT*, Informed RRT*, and Fast Marching Trees (FMT*) with faster anytime convergence towards the optimum, especially in high dimensions.

ROApr 8, 2014
Informed RRT*: Optimal Sampling-based Path Planning Focused via Direct Sampling of an Admissible Ellipsoidal Heuristic

Jonathan D. Gammell, Siddhartha S. Srinivasa, Timothy D. Barfoot

Rapidly-exploring random trees (RRTs) are popular in motion planning because they find solutions efficiently to single-query problems. Optimal RRTs (RRT*s) extend RRTs to the problem of finding the optimal solution, but in doing so asymptotically find the optimal path from the initial state to every state in the planning domain. This behaviour is not only inefficient but also inconsistent with their single-query nature. For problems seeking to minimize path length, the subset of states that can improve a solution can be described by a prolate hyperspheroid. We show that unless this subset is sampled directly, the probability of improving a solution becomes arbitrarily small in large worlds or high state dimensions. In this paper, we present an exact method to focus the search by directly sampling this subset. The advantages of the presented sampling technique are demonstrated with a new algorithm, Informed RRT*. This method retains the same probabilistic guarantees on completeness and optimality as RRT* while improving the convergence rate and final solution quality. We present the algorithm as a simple modification to RRT* that could be further extended by more advanced path-planning algorithms. We show experimentally that it outperforms RRT* in rate of convergence, final solution cost, and ability to find difficult passages while demonstrating less dependence on the state dimension and range of the planning problem.

ROAug 30, 2012
Efficient Touch Based Localization through Submodularity

Shervin Javdani, Matthew Klingensmith, J. Andrew Bagnell et al.

Many robotic systems deal with uncertainty by performing a sequence of information gathering actions. In this work, we focus on the problem of efficiently constructing such a sequence by drawing an explicit connection to submodularity. Ideally, we would like a method that finds the optimal sequence, taking the minimum amount of time while providing sufficient information. Finding this sequence, however, is generally intractable. As a result, many well-established methods select actions greedily. Surprisingly, this often performs well. Our work first explains this high performance -- we note a commonly used metric, reduction of Shannon entropy, is submodular under certain assumptions, rendering the greedy solution comparable to the optimal plan in the offline setting. However, reacting online to observations can increase performance. Recently developed notions of adaptive submodularity provide guarantees for a greedy algorithm in this online setting. In this work, we develop new methods based on adaptive submodularity for selecting a sequence of information gathering actions online. In addition to providing guarantees, we can capitalize on submodularity to attain additional computational speedups. We demonstrate the effectiveness of these methods in simulation and on a robot.