Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration
This addresses the problem of navigating complex mobile applications for users or developers, representing an incremental improvement with a novel method for a known bottleneck.
The paper tackles the challenge of developing intelligent agents for autonomous mobile app interaction by proposing EBC-LLMAgent, which combines large language models with behavior cloning to achieve high success rates in task completion and efficient generalization to unseen scenarios.
Autonomous mobile app interaction has become increasingly important with growing complexity of mobile applications. Developing intelligent agents that can effectively navigate and interact with mobile apps remains a significant challenge. In this paper, we propose an Explainable Behavior Cloning LLM Agent (EBC-LLMAgent), a novel approach that combines large language models (LLMs) with behavior cloning by learning demonstrations to create intelligent and explainable agents for autonomous mobile app interaction. EBC-LLMAgent consists of three core modules: Demonstration Encoding, Code Generation, and UI Mapping, which work synergistically to capture user demonstrations, generate executable codes, and establish accurate correspondence between code and UI elements. We introduce the Behavior Cloning Chain Fusion technique to enhance the generalization capabilities of the agent. Extensive experiments on five popular mobile applications from diverse domains demonstrate the superior performance of EBC-LLMAgent, achieving high success rates in task completion, efficient generalization to unseen scenarios, and the generation of meaningful explanations.