LG ARJun 2

MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency

Saptarshi Mitra, Yifan Zhang, Rachid Karami, Phyo Pyae Moe Aung, Nazmul Takbir, Sreetama Sarkar, Souvik Kundu, Sitao Huang

arXiv:2606.0301475.0

AI Analysis

For practitioners deploying MoA systems with limited GPU resources, MOSAIC addresses load imbalances caused by skewed expert demand and variable generation lengths, significantly improving throughput.

MOSAIC accelerates Mixture-of-Agents (MoA) workloads on limited GPUs by jointly optimizing expert placement and prompt assignment via an ILP scheduler, and using confidence-aware adaptive aggregation to skip the aggregator LLM for consensus queries. On a 4-GPU system, it achieves 1.7–2.3x end-to-end speedups over the baseline while matching accuracy within 0.1 percentage points.

Mixture-of-Agents (MoA) systems improve reasoning accuracy by routing each query to multiple expert LLMs and aggregating their outputs. Efficiently executing this workload on limited GPU resources has bottlenecks. Skill-based routing creates skewed expert demand, and combining instruction-tuned LLMs with long-reasoning models results in extreme variability in generation lengths. Consequently, traditional scheduling strategies suffer from significant GPU idling and throughput collapse due to load imbalances. We present MOSAIC, a scheduling framework to accelerate MoA workloads. First, we formulate an Integer Linear Program (ILP) based scheduler that jointly optimizes expert placement and per-worker prompt assignment from offline-profiled costs, replicating reasoning experts across workers while pinning lightweight ones. Second, MOSAIC uses confidence-aware adaptive aggregation, leveraging inter-expert agreement to bypass the heavy final aggregator LLM for consensus queries. In our 4-GPU system, MOSAIC achieves up to 2.5x expert-stage, 4.23x aggregator-stage and 1.7~2.3x end-to-end speedups over the baseline scheduler, while matching accuracy within 0.1pp.

View on arXiv PDF

Similar