Enna Sachdeva

RO
h-index23
13papers
182citations
Novelty52%
AI Score37

13 Papers

ROMar 28, 2022
Domain Knowledge Driven Pseudo Labels for Interpretable Goal-Conditioned Interactive Trajectory Prediction

Lingfeng Sun, Chen Tang, Yaru Niu et al. · berkeley

Motion forecasting in highly interactive scenarios is a challenging problem in autonomous driving. In such scenarios, we need to accurately predict the joint behavior of interacting agents to ensure the safe and efficient navigation of autonomous vehicles. Recently, goal-conditioned methods have gained increasing attention due to their advantage in performance and their ability to capture the multimodality in trajectory distribution. In this work, we study the joint trajectory prediction problem with the goal-conditioned framework. In particular, we introduce a conditional-variational-autoencoder-based (CVAE) model to explicitly encode different interaction modes into the latent space. However, we discover that the vanilla model suffers from posterior collapse and cannot induce an informative latent space as desired. To address these issues, we propose a novel approach to avoid KL vanishing and induce an interpretable interactive latent space with pseudo labels. The proposed pseudo labels allow us to incorporate domain knowledge on interaction in a flexible manner. We motivate the proposed method using an illustrative toy example. In addition, we validate our framework on the Waymo Open Motion Dataset with both quantitative and qualitative evaluations.

CVSep 12, 2023
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning

Enna Sachdeva, Nakul Agarwal, Suhas Chundi et al.

The widespread adoption of commercial autonomous vehicles (AVs) and advanced driver assistance systems (ADAS) may largely depend on their acceptance by society, for which their perceived trustworthiness and interpretability to riders are crucial. In general, this task is challenging because modern autonomous systems software relies heavily on black-box artificial intelligence models. Towards this goal, this paper introduces a novel dataset, Rank2Tell, a multi-modal ego-centric dataset for Ranking the importance level and Telling the reason for the importance. Using various close and open-ended visual question answering, the dataset provides dense annotations of various semantic, spatial, temporal, and relational attributes of various important objects in complex traffic scenarios. The dense annotations and unique attributes of the dataset make it a valuable resource for researchers working on visual scene understanding and related fields. Furthermore, we introduce a joint model for joint importance level ranking and natural language captions generation to benchmark our dataset and demonstrate performance with quantitative evaluations.

ROAug 22, 2022
DIDER: Discovering Interpretable Dynamically Evolving Relations

Enna Sachdeva, Chiho Choi

Effective understanding of dynamically evolving multiagent interactions is crucial to capturing the underlying behavior of agents in social systems. It is usually challenging to observe these interactions directly, and therefore modeling the latent interactions is essential for realizing the complex behaviors. Recent work on Dynamic Neural Relational Inference (DNRI) captures explicit inter-agent interactions at every step. However, prediction at every step results in noisy interactions and lacks intrinsic interpretability without post-hoc inspection. Moreover, it requires access to ground truth annotations to analyze the predicted interactions, which are hard to obtain. This paper introduces DIDER, Discovering Interpretable Dynamically Evolving Relations, a generic end-to-end interaction modeling framework with intrinsic interpretability. DIDER discovers an interpretable sequence of inter-agent interactions by disentangling the task of latent interaction prediction into sub-interaction prediction and duration estimation. By imposing the consistency of a sub-interaction type over an extended time duration, the proposed framework achieves intrinsic interpretability without requiring any post-hoc inspection. We evaluate DIDER on both synthetic and real-world datasets. The experimental results demonstrate that modeling disentangled and interpretable dynamic relations improves performance on trajectory forecasting tasks.

ROJan 27, 2025
Generalized Mission Planning for Heterogeneous Multi-Robot Teams via LLM-constructed Hierarchical Trees

Piyush Gupta, David Isele, Enna Sachdeva et al.

We present a novel mission-planning strategy for heterogeneous multi-robot teams, taking into account the specific constraints and capabilities of each robot. Our approach employs hierarchical trees to systematically break down complex missions into manageable sub-tasks. We develop specialized APIs and tools, which are utilized by Large Language Models (LLMs) to efficiently construct these hierarchical trees. Once the hierarchical tree is generated, it is further decomposed to create optimized schedules for each robot, ensuring adherence to their individual constraints and capabilities. We demonstrate the effectiveness of our framework through detailed examples covering a wide range of missions, showcasing its flexibility and scalability.

ROJan 7, 2024
Disentangled Neural Relational Inference for Interpretable Motion Prediction

Victoria M. Dax, Jiachen Li, Enna Sachdeva et al.

Effective interaction modeling and behavior prediction of dynamic agents play a significant role in interactive motion planning for autonomous robots. Although existing methods have improved prediction accuracy, few research efforts have been devoted to enhancing prediction model interpretability and out-of-distribution (OOD) generalizability. This work addresses these two challenging aspects by designing a variational auto-encoder framework that integrates graph-based representations and time-sequence models to efficiently capture spatio-temporal relations between interactive agents and predict their dynamics. Our model infers dynamic interaction graphs in a latent space augmented with interpretable edge features that characterize the interactions. Moreover, we aim to enhance model interpretability and performance in OOD scenarios by disentangling the latent space of edge features, thereby strengthening model versatility and robustness. We validate our approach through extensive experiments on both simulated and real-world datasets. The results show superior performance compared to existing methods in modeling spatio-temporal relations, motion prediction, and identifying time-invariant latent features.

CLMar 9, 2025
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks

Haoqiang Kang, Enna Sachdeva, Piyush Gupta et al.

Vision-Language Models (VLMs) have recently shown promising advancements in sequential decision-making tasks through task-specific fine-tuning. However, common fine-tuning methods, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) techniques like Proximal Policy Optimization (PPO), present notable limitations: SFT assumes Independent and Identically Distributed (IID) data, while PPO focuses on maximizing cumulative rewards. These limitations often restrict solution diversity and hinder generalization in multi-step reasoning tasks. To address these challenges, we introduce a novel framework, GFlowVLM, a framework that fine-tune VLMs using Generative Flow Networks (GFlowNets) to promote generation of diverse solutions for complex reasoning tasks. GFlowVLM models the environment as a non-Markovian decision process, allowing it to capture long-term dependencies essential for real-world applications. It takes observations and task descriptions as inputs to prompt chain-of-thought (CoT) reasoning which subsequently guides action selection. We use task based rewards to fine-tune VLM with GFlowNets. This approach enables VLMs to outperform prior fine-tuning methods, including SFT and RL. Empirical results demonstrate the effectiveness of GFlowVLM on complex tasks such as card games (NumberLine, BlackJack) and embodied planning tasks (ALFWorld), showing enhanced training efficiency, solution diversity, and stronger generalization capabilities across both in-distribution and out-of-distribution scenarios.

RONov 9, 2024
Optimal Driver Warning Generation in Dynamic Driving Environment

Chenran Li, Aolin Xu, Enna Sachdeva et al.

The driver warning system that alerts the human driver about potential risks during driving is a key feature of an advanced driver assistance system. Existing driver warning technologies, mainly the forward collision warning and unsafe lane change warning, can reduce the risk of collision caused by human errors. However, the current design methods have several major limitations. Firstly, the warnings are mainly generated in a one-shot manner without modeling the ego driver's reactions and surrounding objects, which reduces the flexibility and generality of the system over different scenarios. Additionally, the triggering conditions of warning are mostly rule-based threshold-checking given the current state, which lacks the prediction of the potential risk in a sufficiently long future horizon. In this work, we study the problem of optimally generating driver warnings by considering the interactions among the generated warning, the driver behavior, and the states of ego and surrounding vehicles on a long horizon. The warning generation problem is formulated as a partially observed Markov decision process (POMDP). An optimal warning generation framework is proposed as a solution to the proposed POMDP. The simulation experiments demonstrate the superiority of the proposed solution to the existing warning generation methods.

CVNov 5, 2024
Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data

Seunggeun Chi, Pin-Hao Huang, Enna Sachdeva et al.

We study the problem of estimating the body movements of a camera wearer from egocentric videos. Current methods for ego-body pose estimation rely on temporally dense sensor data, such as IMU measurements from spatially sparse body parts like the head and hands. However, we propose that even temporally sparse observations, such as hand poses captured intermittently from egocentric videos during natural or periodic hand movements, can effectively constrain overall body motion. Naively applying diffusion models to generate full-body pose from head pose and sparse hand pose leads to suboptimal results. To overcome this, we develop a two-stage approach that decomposes the problem into temporal completion and spatial completion. First, our method employs masked autoencoders to impute hand trajectories by leveraging the spatiotemporal correlations between the head pose sequence and intermittent hand poses, providing uncertainty estimates. Subsequently, we employ conditional diffusion models to generate plausible full-body motions based on these temporally dense trajectories of the head and hands, guided by the uncertainty estimates from the imputation. The effectiveness of our method was rigorously tested and validated through comprehensive experiments conducted on various HMD setup with AMASS and Ego-Exo4D datasets.

CVAug 1, 2025
Contact-Aware Amodal Completion for Human-Object Interaction via Multi-Regional Inpainting

Seunggeun Chi, Enna Sachdeva, Pin-Hao Huang et al.

Amodal completion, which is the process of inferring the full appearance of objects despite partial occlusions, is crucial for understanding complex human-object interactions (HOI) in computer vision and robotics. Existing methods, such as those that use pre-trained diffusion models, often struggle to generate plausible completions in dynamic scenarios because they have a limited understanding of HOI. To solve this problem, we've developed a new approach that uses physical prior knowledge along with a specialized multi-regional inpainting technique designed for HOI. By incorporating physical constraints from human topology and contact information, we define two distinct regions: the primary region, where occluded object parts are most likely to be, and the secondary region, where occlusions are less probable. Our multi-regional inpainting method uses customized denoising strategies across these regions within a diffusion model. This improves the accuracy and realism of the generated completions in both their shape and visual detail. Our experimental results show that our approach significantly outperforms existing methods in HOI scenarios, moving machine perception closer to a more human-like understanding of dynamic environments. We also show that our pipeline is robust even without ground-truth contact annotations, which broadens its applicability to tasks like 3D reconstruction and novel view/pose synthesis.

ROJun 3, 2018
Novel Compliant omnicrawler-wheel transforming module

Akash Singh, Vinay Rodrigues, Enna Sachdeva et al.

This paper presents a novel design of a crawler robot which is capable of transforming its chassis from an Omni crawler mode to a large-sized wheel mode using a novel mechanism. The transformation occurs without any additional actuators. Interestingly the robot can transform into a large diameter and small width wheel which enhances its maneuverability like small turning radius and fast/efficient locomotion. This paper contributes on improving the locomotion mode of previously developed hybrid compliant omnicrawler robot CObRaSO. In addition to legged and tracked mechanism, CObRaSO can now display large wheel mode which contributes to its locomotion capabilities. Mechanical design of the robot has been explained in a detailed manner in this paper and also the transforming experiment and torque analysis has been shown clearly

ROSep 29, 2017
CObRaSO: Compliant Omni-Direction Bendable Hybrid Rigid and Soft OmniCrawler Module

Enna Sachdeva, Akash Singh, Vinay Rodrigues et al.

This paper presents a novel design of an Omnidirectional bendable Omnicrawler module- CObRaSO. Along with the longitudinal crawling and sideways rolling motion, the performance of the OmniCrawler is further enhanced by the introduction of Omnidirectional bending within the module, which is the key contribution of this paper. The Omnidirectional bending is achieved by an arrangement of two independent 1-DOF joints aligned at 90? w.r.t each other. The unique characteristic of this module is its ability to crawl in Omnidirectionally bent configuration which is achieved by a novel design of a 2-DOF roller chain and a backbone of a hybrid structure of a soft-rigid material. This hybrid structure provides compliant pathways for the lug-chain assembly to passively conform with the orientation of the module and crawl in Omnidirectional bent configuration, which makes this module one of its kind. Furthermore, we show that the unique modular design of CObRaSO unveils its versatility by achieving active compliance on an uneven surface, demonstrating its applications in different robotic platforms (an in-pipeline robot, Quadruped and snake robot) and exhibiting hybrid locomotion modes in various configurations of the robots. The mechanism and mobility characteristics of the proposed module have been verified with the aid of simulations and experiments on real robot prototype.

ROJun 19, 2017
Design and optimal springs stiffness estimation of a Modular OmniCrawler in-pipe climbing Robot

Akash Singh, Enna Sachdeva, Abhishek Sarkar et al.

This paper discusses the design of a novel compliant in-pipe climbing modular robot for small diameter pipes. The robot consists of a kinematic chain of 3 OmniCrawler modules with a link connected in between 2 adjacent modules via compliant joints. While the tank-like crawler mechanism provides good traction on low friction surfaces, its circular cross-section makes it holonomic. The holonomic motion assists it to re-align in a direction to avoid obstacles during motion as well as overcome turns with a minimal energy posture. Additionally, the modularity enables it to negotiate T-junction without motion singularity. The compliance is realized using 4 torsion springs incorporated in joints joining 3 modules with 2 links. For a desirable pipe diameter (\textØ 75mm), the springs' stiffness values are obtained by formulating a constraint optimization problem which has been simulated in ADAMS MSC and further validated on a real robot prototype. In order to negotiate smooth vertical bends and friction coefficient variations in pipes, the design was later modified by replacing springs with series elastic actuators (SEA) at 2 of the 4 joints.

ROApr 22, 2017
COCrIP: Compliant OmniCrawler In-pipeline Robot

Akash Singh, Enna Sachdeva, Abhishek Sarkar et al.

This paper presents a modular in-pipeline climbing robot with a novel compliant foldable OmniCrawler mechanism. The circular cross-section of the OmniCrawler module enables a holonomic motion to facilitate the alignment of the robot in the direction of bends. Additionally, the crawler mechanism provides a fair amount of traction, even on slippery surfaces. These advantages of crawler modules have been further supplemented by incorporating active compliance in the module itself which helps to negotiate sharp bends in small diameter pipes. The robot has a series of 3 such compliant foldable modules interconnected by the links via passive joints. For the desirable pipe diameter and curvature of the bends, the spring stiffness value for each passive joint is determined by formulating a constrained optimization problem using the quasi-static model of the robot. Moreover, a minimum friction coefficient value between the module-pipe surface which can be vertically climbed by the robot without slipping is estimated. The numerical simulation results have further been validated by experiments on real robot prototype.