Benedict Quartey

RO
h-index49
8papers
38citations
Novelty39%
AI Score41

8 Papers

AIMar 9, 2023
Exploiting Contextual Structure to Generate Useful Auxiliary Tasks

Benedict Quartey, Ankit Shah, George Konidaris

Reinforcement learning requires interaction with an environment, which is expensive for robots. This constraint necessitates approaches that work with limited environmental interaction by maximizing the reuse of previous experiences. We propose an approach that maximizes experience reuse while learning to solve a given task by generating and simultaneously learning useful auxiliary tasks. To generate these tasks, we construct an abstract temporal logic representation of the given task and leverage large language models to generate context-aware object embeddings that facilitate object replacements. Counterfactual reasoning and off-policy methods allow us to simultaneously learn these auxiliary tasks while solving the given target task. We combine these insights into a novel framework for multitask reinforcement learning and experimentally show that our generated auxiliary tasks share similar underlying exploration requirements as the given task, thereby maximizing the utility of directed exploration. Our approach allows agents to automatically learn additional useful policies without extra environment interaction.

CVOct 21, 2022
An Exploration of Neural Radiance Field Scene Reconstruction: Synthetic, Real-world and Dynamic Scenes

Benedict Quartey, Tuluhan Akbulut, Wasiwasi Mgonzo et al.

This project presents an exploration into 3D scene reconstruction of synthetic and real-world scenes using Neural Radiance Field (NeRF) approaches. We primarily take advantage of the reduction in training and rendering time of neural graphic primitives multi-resolution hash encoding, to reconstruct static video game scenes and real-world scenes, comparing and observing reconstruction detail and limitations. Additionally, we explore dynamic scene reconstruction using Neural Radiance Fields for Dynamic Scenes(D-NeRF). Finally, we extend the implementation of D-NeRF, originally constrained to handle synthetic scenes to also handle real-world dynamic scenes.

ROMay 20
Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition

Benedict Quartey, Sebastian Castro, Eric Rosen et al.

Learning from Demonstration (LfD) enables robots to learn complex behaviors from expert examples, yet existing approaches often fail to generalize to new compositions of known skills without retraining. Modern generative policies model distributions over action trajectories alone, thus are unable to reason about the symbolic outcomes required for robust composition. We propose that skills should jointly model action trajectories and the symbolic outcomes they induce. To address this gap, we introduce Predicate Action Skills (PACTS), a class of closed-loop visuomotor policies that model skills as a joint generative process over action and predicate belief trajectories, producing coherent action-outcome rollouts within a single model. Jointly generating actions and predicates enables PACTS to learn internal representations that improve both action generation and predicate classification. Furthermore, we demonstrate zero-shot composition of learned skills via planning by leveraging online predicate predictions from PACTS as a symbolic interface for sequencing and monitoring execution. Project website: https://planpacts.github.io/

ROOct 21, 2022
Sample Efficient Robot Learning with Structured World Models

Tuluhan Akbulut, Max Merlin, Shane Parr et al.

Reinforcement learning has been demonstrated as a flexible and effective approach for learning a range of continuous control tasks, such as those used by robots to manipulate objects in their environment. But in robotics particularly, real-world rollouts are costly, and sample efficiency can be a major limiting factor when learning a new skill. In game environments, the use of world models has been shown to improve sample efficiency while still achieving good performance, especially when images or other rich observations are provided. In this project, we explore the use of a world model in a deformable robotic manipulation task, evaluating its effect on sample efficiency when learning to fold a cloth in simulation. We compare the use of RGB image observation with a feature space leveraging built-in structure (keypoints representing the cloth configuration), a common approach in robot skill learning, and compare the impact on task performance and learning efficiency with and without the world model. Our experiments showed that the usage of keypoints increased the performance of the best model on the task by 50%, and in general, the use of a learned or constructed reduced feature space improved task performance and sample efficiency. The use of a state transition predictor(MDN-RNN) in our world models did not have a notable effect on task performance.

ROFeb 18, 2024
Verifiably Following Complex Robot Instructions with Foundation Models

Benedict Quartey, Eric Rosen, Stefanie Tellex et al.

When instructing robots, users want to flexibly express constraints, refer to arbitrary landmarks, and verify robot behavior, while robots must disambiguate instructions into specifications and ground instruction referents in the real world. To address this problem, we propose Language Instruction grounding for Motion Planning (LIMP), an approach that enables robots to verifiably follow complex, open-ended instructions in real-world environments without prebuilt semantic maps. LIMP constructs a symbolic instruction representation that reveals the robot's alignment with an instructor's intended motives and affords the synthesis of correct-by-construction robot behaviors. We conduct a large-scale evaluation of LIMP on 150 instructions across five real-world environments, demonstrating its versatility and ease of deployment in diverse, unstructured domains. LIMP performs comparably to state-of-the-art baselines on standard open-vocabulary tasks and additionally achieves a 79\% success rate on complex spatiotemporal instructions, significantly outperforming baselines that only reach 38\%. See supplementary materials and demo videos at https://robotlimp.github.io

RONov 28, 2024
λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics

Ahmed Jaafar, Shreyas Sundara Raman, Sudarshan Harithas et al.

Learning to execute long-horizon mobile manipulation tasks is crucial for advancing robotics in household and workplace settings. However, current approaches are typically data-inefficient, underscoring the need for improved models that require realistically sized benchmarks to evaluate their efficiency. To address this, we introduce the LAMBDA (λ) benchmark-Long-horizon Actions for Mobile-manipulation Benchmarking of Directed Activities-which evaluates the data efficiency of models on language-conditioned, long-horizon, multi-room, multi-floor, pick-and-place tasks using a dataset of manageable size, more feasible for collection. Our benchmark includes 571 human-collected demonstrations that provide realism and diversity in simulated and real-world settings. Unlike planner-generated data, these trajectories offer natural variability and replay-verifiability, ensuring robust learning and evaluation. We leverage λ to benchmark current end-to-end learning methods and a modular neuro-symbolic approach that combines foundation models with task and motion planning. We find that learning methods, even when pretrained, yield lower success rates, while a neuro-symbolic method performs significantly better and requires less data.

ROJun 1, 2025
Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody

David Sasu, Kweku Andoh Yamoah, Benedict Quartey et al.

Enabling robots to accurately interpret and execute spoken language instructions is essential for effective human-robot collaboration. Traditional methods rely on speech recognition to transcribe speech into text, often discarding crucial prosodic cues needed for disambiguating intent. We propose a novel approach that directly leverages speech prosody to infer and resolve instruction intent. Predicted intents are integrated into large language models via in-context learning to disambiguate and select appropriate task plans. Additionally, we present the first ambiguous speech dataset for robotics, designed to advance research in speech disambiguation. Our method achieves 95.79% accuracy in detecting referent intents within an utterance and determines the intended task plan of ambiguous instructions with 71.96% accuracy, demonstrating its potential to significantly improve human-robot communication.

ROJun 20, 2020
Affordable Modular Autonomous Vehicle Development Platform

Benedict Quartey, G. Ayorkor Korsah

Road accidents are estimated to be the ninth leading cause of death across all age groups globally. 1.25 million people die annually from road accidents and Africa has the highest rate of road fatalities [1]. Research shows that three out of five road accidents are caused by driver-related behavioral factors [2]. Self-driving technology has the potential of saving lives lost to these preventable road accidents. Africa accounts for the majority of road fatalities and as such would benefit immensely from this technology. However, financial constraints prevent viable experimentation and research into self-driving technology in Africa. This paper describes the design of RollE, an affordable modular autonomous vehicle development platform. It is capable of driving via remote control for data collection and also capable of autonomous driving using a convolutional neural network. This system is aimed at providing students and researchers with an affordable autonomous vehicle to develop and test self-driving car technology.