CLROMar 9, 2023

Data-Efficient Learning of Natural Language to Linear Temporal Logic Translators for Robot Task Specification

Berkeley
arXiv:2303.08006v20.3957 citationsh-index: 40
AI Analysis55

This enables more accessible robot programming by reducing data requirements, though it is incremental in leveraging existing LLMs and synthetic data generation.

The paper tackles the problem of translating natural language commands to linear temporal logic (LTL) specifications for robot task specification with limited human-labeled data, achieving 75% accuracy using only ≤12 annotations and up to 95% accuracy with full datasets.

To make robots accessible to a broad audience, it is critical to endow them with the ability to take universal modes of communication, like commands given in natural language, and extract a concrete desired task specification, defined using a formal language like linear temporal logic (LTL). In this paper, we present a learning-based approach for translating from natural language commands to LTL specifications with very limited human-labeled training data. This is in stark contrast to existing natural-language to LTL translators, which require large human-labeled datasets, often in the form of labeled pairs of LTL formulas and natural language commands, to train the translator. To reduce reliance on human data, our approach generates a large synthetic training dataset through algorithmic generation of LTL formulas, conversion to structured English, and then exploiting the paraphrasing capabilities of modern large language models (LLMs) to synthesize a diverse corpus of natural language commands corresponding to the LTL formulas. We use this generated data to finetune an LLM and apply a constrained decoding procedure at inference time to ensure the returned LTL formula is syntactically correct. We evaluate our approach on three existing LTL/natural language datasets and show that we can translate natural language commands at 75\% accuracy with far less human data ($\le$12 annotations). Moreover, when training on large human-annotated datasets, our method achieves higher test accuracy (95\% on average) than prior work. Finally, we show the translated formulas can be used to plan long-horizon, multi-stage tasks on a 12D quadrotor.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes