AIJun 4, 2025

macOSWorld: A Multilingual Interactive Benchmark for GUI Agents

arXiv:2506.04135v417 citationsh-index: 13Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the need for domain-specific adaptation and safety evaluation for GUI agents on macOS, particularly for multilingual accessibility, though it is incremental as it extends existing benchmarking approaches to a new OS.

The authors tackled the lack of multilingual interactive benchmarks for GUI agents on macOS by creating macOSWorld, which features 202 tasks across 30 applications in 5 languages, and found that proprietary agents achieved above 30% success rates while open-source models lagged below 5%, with Arabic tasks showing a 28.8% average degradation compared to English.

Graphical User Interface (GUI) agents show promising capabilities for automating computer-use tasks and facilitating accessibility, but existing interactive benchmarks are mostly English-only, covering web-use or Windows, Linux, and Android environments, but not macOS. macOS is a major OS with distinctive GUI patterns and exclusive applications. To bridge the gaps, we present macOSWorld, the first comprehensive benchmark for evaluating GUI agents on macOS. macOSWorld features 202 multilingual interactive tasks across 30 applications (28 macOS-exclusive), with task instructions and OS interfaces offered in 5 languages (English, Chinese, Arabic, Japanese, and Russian). As GUI agents are shown to be vulnerable to deception attacks, macOSWorld also includes a dedicated safety benchmarking subset. Our evaluation on six GUI agents reveals a dramatic gap: proprietary computer-use agents lead at above 30% success rate, while open-source lightweight research models lag at below 5\%, highlighting the need for macOS domain adaptation. Multilingual benchmarks also expose common weaknesses, especially in Arabic, with a 28.8% average degradation compared to English. Results from safety benchmarking also highlight that deception attacks are more general and demand immediate attention. Project page: https://macos-world.github.io.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes