HCAIMAOct 11, 2025

ALLOY: Generating Reusable Agent Workflows from User Demonstration

arXiv:2510.10049v12 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the challenge for end-users in effectively delegating complex, preference-based tasks to AI agents, offering a more intuitive and reusable interaction method.

The paper tackles the problem of users struggling to specify procedural requirements for LLM-based agents through prompts, especially for preference-driven tasks like social media posting or trip planning, by introducing ALLOY, a system that uses user demonstrations to generate reusable workflows, which outperformed prompt-based agents in capturing user intent in a study with 12 participants.

Large language models (LLMs) enable end-users to delegate complex tasks to autonomous agents through natural language. However, prompt-based interaction faces critical limitations: Users often struggle to specify procedural requirements for tasks, especially those that don't have a factually correct solution but instead rely on personal preferences, such as posting social media content or planning a trip. Additionally, a ''successful'' prompt for one task may not be reusable or generalizable across similar tasks. We present ALLOY, a system inspired by classical HCI theories on Programming by Demonstration (PBD), but extended to enhance adaptability in creating LLM-based web agents. ALLOY enables users to express procedural preferences through natural demonstrations rather than prompts, while making these procedures transparent and editable through visualized workflows that can be generalized across task variations. In a study with 12 participants, ALLOY's demonstration--based approach outperformed prompt-based agents and manual workflows in capturing user intent and procedural preferences in complex web tasks. Insights from the study also show how demonstration--based interaction complements the traditional prompt-based approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes