AI LGMar 7, 2025

FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data

Wenhao Wang, Zijie Yu, Rui Ye, Jianqing Zhang, Siheng Chen, Yanfeng Wang

arXiv:2503.05143v13 citationsh-index: 18Has Code

Originality Synthesis-oriented

AI Analysis

This provides a foundational benchmark for researchers in federated learning and mobile agents, addressing a key bottleneck in the field, though it is incremental as it builds on existing federated learning methods.

The paper tackles the lack of standardized benchmarks for mobile agents trained with federated learning on decentralized heterogeneous user data by introducing FedMABench, a comprehensive benchmark with 6 datasets, 8 algorithms, and over 800 apps, revealing that federated algorithms outperform local training and highlighting the role of app distribution and correlations.

Mobile agents have attracted tremendous research participation recently. Traditional approaches to mobile agent training rely on centralized data collection, leading to high cost and limited scalability. Distributed training utilizing federated learning offers an alternative by harnessing real-world user data, providing scalability and reducing costs. However, pivotal challenges, including the absence of standardized benchmarks, hinder progress in this field. To tackle the challenges, we introduce FedMABench, the first benchmark for federated training and evaluation of mobile agents, specifically designed for heterogeneous scenarios. FedMABench features 6 datasets with 30+ subsets, 8 federated algorithms, 10+ base models, and over 800 apps across 5 categories, providing a comprehensive framework for evaluating mobile agents across diverse environments. Through extensive experiments, we uncover several key insights: federated algorithms consistently outperform local training; the distribution of specific apps plays a crucial role in heterogeneity; and, even apps from distinct categories can exhibit correlations during training. FedMABench is publicly available at: https://github.com/wwh0411/FedMABench with the datasets at: https://huggingface.co/datasets/wwh0411/FedMABench.

View on arXiv PDF Code

Similar