CLAug 28, 2018

Mapping Natural Language Commands to Web Elements

arXiv:1808.09132v21105 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of enabling more intuitive human-computer interaction on the web, but it is incremental as it focuses on a specific domain without broad SOTA impact.

The paper tackles the problem of grounding natural language commands to web elements, such as clicking on specific links or text boxes, by introducing a new task and collecting a dataset of over 50,000 commands that capture phenomena like functional references, relational reasoning, and visual reasoning.

The web provides a rich, open-domain environment with textual, structural, and spatial properties. We propose a new task for grounding language in this environment: given a natural language command (e.g., "click on the second article"), choose the correct element on the web page (e.g., a hyperlink or text box). We collected a dataset of over 50,000 commands that capture various phenomena such as functional references (e.g. "find who made this site"), relational reasoning (e.g. "article by john"), and visual reasoning (e.g. "top-most article"). We also implemented and analyzed three baseline models that capture different phenomena present in the dataset.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes