AIJun 13, 2025

Efficient LLM Collaboration via Planning

Byeongchan Lee, Jonghoon Lee, Dongyoung Kim, Jaehyung Kim, Kyungjoon Park, Dongjun Lee, Jinwoo Shin

arXiv:2506.11578v214.77 citationsh-index: 13Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of high API costs for large LLMs, enabling more accessible and efficient AI applications, though it is incremental as it builds on existing collaboration ideas.

The paper tackles the cost-performance trade-off between large proprietary and small open-source LLMs by proposing COPE, a test-time collaboration framework where models alternate as planner and executor using generated plans, achieving performance comparable to large models while drastically reducing inference API costs.

Recently, large language models (LLMs) have demonstrated strong performance, ranging from simple to complex tasks. However, while large proprietary models (e.g., models with over 100B parameters) achieve remarkable results across diverse tasks, they are often accessible through costly APIs, making frequent use too costly for many applications. In contrast, small open-source models (e.g., models with fewer than 3B parameters) are freely available and easy to deploy locally, but their performance on complex tasks remains limited. This trade-off raises a natural question: how can small and large models efficiently collaborate to combine their complementary strengths? To bridge this trade-off, we propose COPE, a test-time collaboration framework. A planner model first generates a plan, a high-level abstraction of the task, and this plan serves as a lightweight intermediate that guides a downstream executor model. Small and large models take turns acting as planner and executor, exchanging plans in a multi-stage cascade to collaboratively solve tasks. Through comprehensive experiments on benchmarks spanning mathematical reasoning, code generation, open-ended tasks, and agent tasks, we demonstrate that COPE achieves performance comparable to large proprietary models, while drastically reducing the inference API cost. These results highlight planning as an effective prior for cost-efficient inference.

View on arXiv PDF

Similar