ROAIJan 25, 2021

droidlet: modular, heterogenous, multi-modal agents

arXiv:2101.10384v13 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of creating multi-modal agents for robotics and AI, though it appears incremental as it integrates existing components rather than proposing a fundamentally new approach.

The paper tackles the problem of building integrated agents that combine perception, language, and action by introducing droidlet, a modular and heterogeneous architecture, which enables exploitation of large-scale datasets and robotics heuristics to facilitate learning from real-world interactions.

In recent years, there have been significant advances in building end-to-end Machine Learning (ML) systems that learn at scale. But most of these systems are: (a) isolated (perception, speech, or language only); (b) trained on static datasets. On the other hand, in the field of robotics, large-scale learning has always been difficult. Supervision is hard to gather and real world physical interactions are expensive. In this work we introduce and open-source droidlet, a modular, heterogeneous agent architecture and platform. It allows us to exploit both large-scale static datasets in perception and language and sophisticated heuristics often used in robotics; and provides tools for interactive annotation. Furthermore, it brings together perception, language and action onto one platform, providing a path towards agents that learn from the richness of real world interactions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes