ROAug 27, 2022
Object Goal Navigation using Data Regularized Q-LearningNandiraju Gireesh, D. A. Sasi Kiran, Snehasis Banerjee et al.
Object Goal Navigation requires a robot to find and navigate to an instance of a target object class in a previously unseen environment. Our framework incrementally builds a semantic map of the environment over time, and then repeatedly selects a long-term goal ('where to go') based on the semantic map to locate the target object instance. Long-term goal selection is formulated as a vision-based deep reinforcement learning problem. Specifically, an Encoder Network is trained to extract high-level features from a semantic map and select a long-term goal. In addition, we incorporate data augmentation and Q-function regularization to make the long-term goal selection more effective. We report experimental results using the photo-realistic Gibson benchmark dataset in the AI Habitat 3D simulation environment to demonstrate substantial performance improvement on standard measures in comparison with a state of the art data-driven baseline.
ROAug 27, 2022
Spatial Relation Graph and Graph Convolutional Network for Object Goal NavigationD. A. Sasi Kiran, Kritika Anand, Chaitanya Kharyal et al.
This paper describes a framework for the object-goal navigation task, which requires a robot to find and move to the closest instance of a target object class from a random starting position. The framework uses a history of robot trajectories to learn a Spatial Relational Graph (SRG) and Graph Convolutional Network (GCN)-based embeddings for the likelihood of proximity of different semantically-labeled regions and the occurrence of different object classes in these regions. To locate a target object instance during evaluation, the robot uses Bayesian inference and the SRG to estimate the visible regions, and uses the learned GCN embeddings to rank visible regions and select the region to explore next.
ROOct 20, 2022
Object Goal Navigation Based on Semantics and RGB Ego ViewSnehasis Banerjee, Brojeshwar Bhowmick, Ruddra Dev Roychoudhury
This paper presents an architecture and methodology to empower a service robot to navigate an indoor environment with semantic decision making, given RGB ego view. This method leverages the knowledge of robot's actuation capability and that of scenes, objects and their relations -- represented in a semantic form. The robot navigates based on GeoSem map - a relational combination of geometric and semantic map. The goal given to the robot is to find an object in a unknown environment with no navigational map and only egocentric RGB camera perception. The approach is tested both on a simulation environment and real life indoor settings. The presented approach was found to outperform human users in gamified evaluations with respect to average completion time.
ROFeb 4, 2025
Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household EnvironmentsRaghav Arora, Shivam Singh, Karthik Swaminathan et al. · mit
Assistive agents performing household tasks such as making the bed or cooking breakfast often compute and execute actions that accomplish one task at a time. However, efficiency can be improved by anticipating upcoming tasks and computing an action sequence that jointly achieves these tasks. State-of-the-art methods for task anticipation use data-driven deep networks and Large Language Models (LLMs), but they do so at the level of high-level tasks and/or require many training examples. Our framework leverages the generic knowledge of LLMs through a small number of prompts to perform high-level task anticipation, using the anticipated tasks as goals in a classical planning system to compute a sequence of finer-granularity actions that jointly achieve these goals. We ground and evaluate our framework's abilities in realistic scenarios in the VirtualHome environment and demonstrate a 31% reduction in execution time compared with a system that does not consider upcoming tasks.
ROFeb 4, 2025
AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge RefinementShivam Singh, Karthik Swaminathan, Nabanita Dash et al.
An embodied agent assisting humans is often asked to complete new tasks, and there may not be sufficient time or labeled examples to train the agent to perform these new tasks. Large Language Models (LLMs) trained on considerable knowledge across many domains can be used to predict a sequence of abstract actions for completing such tasks, although the agent may not be able to execute this sequence due to task-, agent-, or domain-specific constraints. Our framework addresses these challenges by leveraging the generic predictions provided by LLM and the prior domain knowledge encoded in a Knowledge Graph (KG), enabling an agent to quickly adapt to new tasks. The robot also solicits and uses human input as needed to refine its existing knowledge. Based on experimental evaluation in the context of cooking and cleaning tasks in simulation domains, we demonstrate that the interplay between LLM, KG, and human input leads to substantial performance gains compared with just using the LLM. Project website§: https://sssshivvvv.github.io/adaptbot/
ROApr 4, 2024
Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot CollaborationShivam Singh, Karthik Swaminathan, Raghav Arora et al.
An agent assisting humans in daily living activities can collaborate more effectively by anticipating upcoming tasks. Data-driven methods represent the state of the art in task anticipation, planning, and related problems, but these methods are resource-hungry and opaque. Our prior work introduced a proof of concept framework that used an LLM to anticipate 3 high-level tasks that served as goals for a classical planning system that computed a sequence of low-level actions for the agent to achieve these goals. This paper describes DaTAPlan, our framework that significantly extends our prior work toward human-robot collaboration. Specifically, DaTAPlan planner computes actions for an agent and a human to collaboratively and jointly achieve the tasks anticipated by the LLM, and the agent automatically adapts to unexpected changes in human action outcomes and preferences. We evaluate DaTAPlan capabilities in a realistic simulation environment, demonstrating accurate task anticipation, effective human-robot collaboration, and the ability to adapt to unexpected changes. Project website: https://dataplan-hrc.github.io
ROMay 10, 2023
Sequence-Agnostic Multi-Object NavigationNandiraju Gireesh, Ayush Agrawal, Ahana Datta et al.
The Multi-Object Navigation (MultiON) task requires a robot to localize an instance (each) of multiple object classes. It is a fundamental task for an assistive robot in a home or a factory. Existing methods for MultiON have viewed this as a direct extension of Object Navigation (ON), the task of localising an instance of one object class, and are pre-sequenced, i.e., the sequence in which the object classes are to be explored is provided in advance. This is a strong limitation in practical applications characterized by dynamic changes. This paper describes a deep reinforcement learning framework for sequence-agnostic MultiON based on an actor-critic architecture and a suitable reward specification. Our framework leverages past experiences and seeks to reward progress toward individual as well as multiple target object classes. We use photo-realistic scenes from the Gibson benchmark dataset in the AI Habitat 3D simulation environment to experimentally show that our method performs better than a pre-sequenced approach and a state of the art ON method extended to MultiON.
RONov 22, 2021
Talk-to-Resolve: Combining scene understanding and spatial dialogue to resolve granular task ambiguity for a collocated robotPradip Pramanick, Chayan Sarkar, Snehasis Banerjee et al.
The utility of collocating robots largely depends on the easy and intuitive interaction mechanism with the human. If a robot accepts task instruction in natural language, first, it has to understand the user's intention by decoding the instruction. However, while executing the task, the robot may face unforeseeable circumstances due to the variations in the observed scene and therefore requires further user intervention. In this article, we present a system called Talk-to-Resolve (TTR) that enables a robot to initiate a coherent dialogue exchange with the instructor by observing the scene visually to resolve the impasse. Through dialogue, it either finds a cue to move forward in the original plan, an acceptable alternative to the original plan, or affirmation to abort the task altogether. To realize the possible stalemate, we utilize the dense captions of the observed scene and the given instruction jointly to compute the robot's next action. We evaluate our system based on a data set of initial instruction and situational scene pairs. Our system can identify the stalemate and resolve them with appropriate dialogue exchange with 82% accuracy. Additionally, a user study reveals that the questions from our systems are more natural (4.02 on average on a scale of 1 to 5) as compared to a state-of-the-art (3.08 on average).
MLNov 6, 2017
Interpretable Feature Recommendation for Signal AnalyticsSnehasis Banerjee, Tanushyam Chattopadhyay, Ayan Mukherjee
This paper presents an automated approach for interpretable feature recommendation for solving signal data analytics problems. The method has been tested by performing experiments on datasets in the domain of prognostics where interpretation of features is considered very important. The proposed approach is based on Wide Learning architecture and provides means for interpretation of the recommended features. It is to be noted that such an interpretation is not available with feature learning approaches like Deep Learning (such as Convolutional Neural Network) or feature transformation approaches like Principal Component Analysis. Results show that the feature recommendation and interpretation techniques are quite effective for the problems at hand in terms of performance and drastic reduction in time to develop a solution. It is further shown by an example, how this human-in-loop interpretation system can be used as a prescriptive system.
MLJul 13, 2017
Automation of Feature Engineering for IoT AnalyticsSnehasis Banerjee, Tanushyam Chattopadhyay, Arpan Pal et al.
This paper presents an approach for automation of interpretable feature selection for Internet Of Things Analytics (IoTA) using machine learning (ML) techniques. Authors have conducted a survey over different people involved in different IoTA based application development tasks. The survey reveals that feature selection is the most time consuming and niche skill demanding part of the entire workflow. This paper shows how feature selection is successfully automated without sacrificing the decision making accuracy and thereby reducing the project completion time and cost of hiring expensive resources. Several pattern recognition principles and state of art (SoA) ML techniques are followed to design the overall approach for the proposed automation. Three data sets are considered to establish the proof-of-concept. Experimental results show that the proposed automation is able to reduce the time for feature selection to $2$ days instead of $4-6$ months which would have been required in absence of the automation. This reduction in time is achieved without any sacrifice in the accuracy of the decision making process. Proposed method is also compared against Multi Layer Perceptron (MLP) model as most of the state of the art works on IoTA uses MLP based Deep Learning. Moreover the feature selection method is compared against SoA feature reduction technique namely Principal Component Analysis (PCA) and its variants. The results obtained show that the proposed method is effective.
MLDec 17, 2016
Towards Wide Learning: Experiments in HealthcareSnehasis Banerjee, Tanushyam Chattopadhyay, Swagata Biswas et al.
In this paper, a Wide Learning architecture is proposed that attempts to automate the feature engineering portion of the machine learning (ML) pipeline. Feature engineering is widely considered as the most time consuming and expert knowledge demanding portion of any ML task. The proposed feature recommendation approach is tested on 3 healthcare datasets: a) PhysioNet Challenge 2016 dataset of phonocardiogram (PCG) signals, b) MIMIC II blood pressure classification dataset of photoplethysmogram (PPG) signals and c) an emotion classification dataset of PPG signals. While the proposed method beats the state of the art techniques for 2nd and 3rd dataset, it reaches 94.38% of the accuracy level of the winner of PhysioNet Challenge 2016. In all cases, the effort to reach a satisfactory performance was drastically less (a few days) than manual feature engineering.