Paloma Sodhi

RO
9papers
262citations
Novelty48%
AI Score29

9 Papers

ROJul 19, 2022Code
Theseus: A Library for Differentiable Nonlinear Optimization

Luis Pineda, Taosha Fan, Maurizio Monge et al.

We present Theseus, an efficient application-agnostic open source library for differentiable nonlinear least squares (DNLS) optimization built on PyTorch, providing a common framework for end-to-end structured learning in robotics and vision. Existing DNLS implementations are application specific and do not always incorporate many ingredients important for efficiency. Theseus is application-agnostic, as we illustrate with several example applications that are built using the same underlying differentiable components, such as second-order optimizers, standard costs functions, and Lie groups. For efficiency, Theseus incorporates support for sparse solvers, automatic vectorization, batching, GPU acceleration, and gradient computation with implicit differentiation and direct loss minimization. We do extensive performance evaluation in a set of applications, demonstrating significant efficiency gains and better scalability when these features are incorporated. Project page: https://sites.google.com/view/theseus-ai

CLJul 23, 2023
On the Effectiveness of Offline RL for Dialogue Response Generation

Paloma Sodhi, Felix Wu, Ethan R. Elenberg et al.

A common training technique for language models is teacher forcing (TF). TF attempts to match human language exactly, even though identical meanings can be expressed in different ways. This motivates use of sequence-level objectives for dialogue response generation. In this paper, we study the efficacy of various offline reinforcement learning (RL) methods to maximize such objectives. We present a comprehensive evaluation across multiple datasets, models, and metrics. Offline RL shows a clear performance improvement over teacher forcing while not inducing training instability or sacrificing practical training budgets.

LGOct 5, 2023
SteP: Stacked LLM Policies for Web Actions

Paloma Sodhi, S. R. K. Branavan, Yoav Artzi et al.

Performing tasks on the web presents fundamental challenges to large language models (LLMs), including combinatorially large open-world tasks and variations across web interfaces. Simply specifying a large prompt to handle all possible behaviors and states is extremely complex, and results in behavior leaks between unrelated behaviors. Decomposition to distinct policies can address this challenge, but requires carefully handing off control between policies. We propose Stacked LLM Policies for Web Actions (SteP), an approach to dynamically compose policies to solve a diverse set of web tasks. SteP defines a Markov Decision Process where the state is a stack of policies representing the control state, i.e., the chain of policy calls. Unlike traditional methods that are restricted to static hierarchies, SteP enables dynamic control that adapts to the complexity of the task. We evaluate SteP against multiple baselines and web environments including WebArena, MiniWoB++, and a CRM. On WebArena, SteP improves (14.9\% to 33.5\%) over SOTA that use GPT-4 policies, while on MiniWob++, SteP is competitive with prior works while using significantly less data. Our code and data are available at https://asappresearch.github.io/webagents-step.

CLNov 14, 2023
Workflow-Guided Response Generation for Task-Oriented Dialogue

Do June Min, Paloma Sodhi, Ramya Ramakrishnan

Task-oriented dialogue (TOD) systems aim to achieve specific goals through interactive dialogue. Such tasks usually involve following specific workflows, i.e. executing a sequence of actions in a particular order. While prior work has focused on supervised learning methods to condition on past actions, they do not explicitly optimize for compliance to a desired workflow. In this paper, we propose a novel framework based on reinforcement learning (RL) to generate dialogue responses that are aligned with a given workflow. Our framework consists of ComplianceScorer, a metric designed to evaluate how well a generated response executes the specified action, combined with an RL opimization process that utilizes an interactive sampling technique. We evaluate our approach on two TOD datasets, Action-Based Conversations Dataset (ABCD) (Chen et al., 2021a) and MultiWOZ 2.2 (Zang et al., 2020) on a range of automated and human evaluation metrics. Our findings indicate that our RL-based framework outperforms baselines and is effective at enerating responses that both comply with the intended workflows while being expressed in a natural and fluent manner.

ROJul 15, 2021Code
CMU-GPR Dataset: Ground Penetrating Radar Dataset for Robot Localization and Mapping

Alexander Baikovitz, Paloma Sodhi, Michael Dille et al.

There has been exciting recent progress in using radar as a sensor for robot navigation due to its increased robustness to varying environmental conditions. However, within these different radar perception systems, ground penetrating radar (GPR) remains under-explored. By measuring structures beneath the ground, GPR can provide stable features that are less variant to ambient weather, scene, and lighting changes, making it a compelling choice for long-term spatio-temporal mapping. In this work, we present the CMU-GPR dataset--an open-source ground penetrating radar dataset for research in subsurface-aided perception for robot navigation. In total, the dataset contains 15 distinct trajectory sequences in 3 GPS-denied, indoor environments. Measurements from a GPR, wheel encoder, RGB camera, and inertial measurement unit were collected with ground truth positions from a robotic total station. In addition to the dataset, we also provide utility code to convert raw GPR data into processed images. This paper describes our recording platform, the data format, utility scripts, and proposed methods for using this data.

RONov 15, 2021
PatchGraph: In-hand tactile tracking with learned surface normals

Paloma Sodhi, Michael Kaess, Mustafa Mukadam et al.

We address the problem of tracking 3D object poses from touch during in-hand manipulations. Specifically, we look at tracking small objects using vision-based tactile sensors that provide high-dimensional tactile image measurements at the point of contact. While prior work has relied on a-priori information about the object being localized, we remove this requirement. Our key insight is that an object is composed of several local surface patches, each informative enough to achieve reliable object tracking. Moreover, we can recover the geometry of this local patch online by extracting local surface normal information embedded in each tactile image. We propose a novel two-stage approach. First, we learn a mapping from tactile images to surface normals using an image translation network. Second, we use these surface normals within a factor graph to both reconstruct a local patch map and use it to infer 3D object poses. We demonstrate reliable object tracking for over $100$ contact sequences across unique shapes with four objects in simulation and two objects in the real-world. Supplementary video: https://youtu.be/FHks--haOGY

ROAug 4, 2021
LEO: Learning Energy-based Models in Factor Graph Optimization

Paloma Sodhi, Eric Dexheimer, Mustafa Mukadam et al.

We address the problem of learning observation models end-to-end for estimation. Robots operating in partially observable environments must infer latent states from multiple sensory inputs using observation models that capture the joint distribution between latent states and observations. This inference problem can be formulated as an objective over a graph that optimizes for the most likely sequence of states using all previous measurements. Prior work uses observation models that are either known a-priori or trained on surrogate losses independent of the graph optimizer. In this paper, we propose a method to directly optimize end-to-end tracking performance by learning observation models with the graph optimizer in the loop. This direct approach may appear, however, to require the inference algorithm to be fully differentiable, which many state-of-the-art graph optimizers are not. Our key insight is to instead formulate the problem as that of energy-based learning. We propose a novel approach, LEO, for learning observation models end-to-end with graph optimizers that may be non-differentiable. LEO alternates between sampling trajectories from the graph posterior and updating the model to match these samples to ground truth trajectories. We propose a way to generate such samples efficiently using incremental Gauss-Newton solvers. We compare LEO against baselines on datasets drawn from two distinct tasks: navigation and real-world planar pushing. We show that LEO is able to learn complex observation models with lower errors and fewer samples. Supplementary video: https://youtu.be/YqzlUPudfkA

ROMar 29, 2021
Ground Encoding: Learned Factor Graph-based Models for Localizing Ground Penetrating Radar

Alexander Baikovitz, Paloma Sodhi, Michael Dille et al.

We address the problem of robot localization using ground penetrating radar (GPR) sensors. Current approaches for localization with GPR sensors require a priori maps of the system's environment as well as access to approximate global positioning (GPS) during operation. In this paper, we propose a novel, real-time GPR-based localization system for unknown and GPS-denied environments. We model the localization problem as an inference over a factor graph. Our approach combines 1D single-channel GPR measurements to form 2D image submaps. To use these GPR images in the graph, we need sensor models that can map noisy, high-dimensional image measurements into the state space. These are challenging to obtain a priori since image generation has a complex dependency on subsurface composition and radar physics, which itself varies with sensors and variations in subsurface electromagnetic properties. Our key idea is to instead learn relative sensor models directly from GPR data that map non-sequential GPR image pairs to relative robot motion. These models are incorporated as factors within the factor graph with relative motion predictions correcting for accumulated drift in the position estimates. We demonstrate our approach over datasets collected across multiple locations using a custom designed experimental rig. We show reliable, real-time localization using only GPR and odometry measurements for varying trajectories in three distinct GPS-denied environments. For our supplementary video, see https://youtu.be/HXXgdTJzqyw.

RODec 7, 2020
Learning Tactile Models for Factor Graph-based Estimation

Paloma Sodhi, Michael Kaess, Mustafa Mukadam et al.

We're interested in the problem of estimating object states from touch during manipulation under occlusions. In this work, we address the problem of estimating object poses from touch during planar pushing. Vision-based tactile sensors provide rich, local image measurements at the point of contact. A single such measurement, however, contains limited information and multiple measurements are needed to infer latent object state. We solve this inference problem using a factor graph. In order to incorporate tactile measurements in the graph, we need local observation models that can map high-dimensional tactile images onto a low-dimensional state space. Prior work has used low-dimensional force measurements or engineered functions to interpret tactile measurements. These methods, however, can be brittle and difficult to scale across objects and sensors. Our key insight is to directly learn tactile observation models that predict the relative pose of the sensor given a pair of tactile images. These relative poses can then be incorporated as factors within a factor graph. We propose a two-stage approach: first we learn local tactile observation models supervised with ground truth data, and then integrate these models along with physics and geometric factors within a factor graph optimizer. We demonstrate reliable object tracking using only tactile feedback for 150 real-world planar pushing sequences with varying trajectories across three object shapes. Supplementary video: https://youtu.be/y1kBfSmi8w0