CLAIMay 1

A11y-Compressor: A Framework for Enhancing the Efficiency of GUI Agent Observations through Visual Context Reconstruction and Redundancy Reduction

arXiv:2605.0055176.1h-index: 4
Predicted impact top 80% in CL · last 90 daysOriginality Incremental advance
AI Analysis

For developers of GUI-based AI agents, this framework addresses the inefficiency and lack of structure in accessibility tree observations, offering a practical improvement in both token efficiency and task performance.

A11y-Compressor transforms linearized accessibility trees into compact, structured representations for GUI agents, reducing input tokens to 22% of the original while improving task success rates by 5.1 percentage points on average on the OSWorld benchmark.

AI agents that interact with graphical user interfaces (GUIs) require effective observation representations for reliable grounding. The accessibility tree is a commonly used text-based format that encodes UI element attributes, but it suffers from redundancy and lacks structural information such as spatial relationships among elements. We propose A11y-Compressor, a framework that transforms linearized accessibility trees into compact and structured representations. Our implementation, Compressed-a11y, applies a lightweight and structured transformation pipeline with modal detection, redundancy reduction, and semantic structuring. Experiments on the OSWorld benchmark show that Compressed-a11y reduces input tokens to 22% of the original while improving task success rates by 5.1 percentage points on average.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes