LGAIPLFeb 2

COLT: Lightweight Multi-LLM Collaboration through Shared MCTS Reasoning for Model Compilation

arXiv:2602.01935v1h-index: 2
Originality Incremental advance
AI Analysis

This addresses the problem of scalable and cost-effective AI deployment for developers and organizations, though it is incremental as it builds on existing LLM-based compiler optimization methods.

The paper tackles the high cost of model serving in AI systems by proposing COLT, a lightweight multi-LLM collaborative framework for compiler optimization that uses a shared MCTS tree to coordinate reasoning across models, achieving performance comparable to a single large model while reducing costs.

Model serving costs dominate AI systems, making compiler optimization essential for scalable deployment. Recent works show that a large language model (LLM) can guide compiler search by reasoning over program structure and optimization history. However, using a single large model throughout the search is expensive, while smaller models are less reliable when used alone. Thus, this paper seeks to answer whether multi-LLM collaborative reasoning relying primarily on small LLMs can match or exceed the performance of a single large model. As such, we propose a lightweight collaborative multi-LLM framework, dubbed COLT, for compiler optimization that enables coordinated reasoning across multiple models within a single Monte Carlo tree search (MCTS) process. A key contribution is the use of a single shared MCTS tree as the collaboration substrate across LLMs, enabling the reuse of transformation prefixes and cross-model value propagation. Hence, we circumvent both heavy internal reasoning mechanisms and conventional agentic machinery that relies on external planners, multiple concurrent LLMs, databases, external memory/versioning of intermediate results, and controllers by simply endogenizing model selection within the lightweight MCTS optimization loop. Every iteration, the acting LLM proposes a joint action: (compiler transformation, model to be queried next). We also introduce a model-aware tree policy that biases search toward smaller models while preserving exploration, and a course-alteration mechanism that escalates to the largest model when the search exhibits persistent regressions attributable to smaller models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes