SELGFeb 5, 2025

COSMosFL: Ensemble of Small Language Models for Fault Localisation

arXiv:2502.02908v13 citationsh-index: 11Has Code2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)
Originality Incremental advance
AI Analysis

This addresses the cost and deployment barriers of large models for software engineering tasks, offering a more accessible solution, though it is incremental as it builds on existing ensemble and fault localization methods.

The paper tackles the performance gap between small and large language models for fault localization by introducing COSMos, an ensemble technique that uses voting to combine small models, achieving Pareto-optimal trade-offs between accuracy and inference costs on the Defects4J benchmark.

LLMs are rapidly being adopted to build powerful tools and agents for software engineering, but most of them rely heavily on extremely large closed-source models. This, in turn, can hinder wider adoption due to security issues as well as financial cost and environmental impact. Recently, a number of open source Small Language Models (SLMs) are being released and gaining traction. While SLMs are smaller, more energy-efficient, and therefore easier to locally deploy, they tend to show worse performance when compared to larger closed LLMs. We present COSMos, a task-level LLM ensemble technique that uses voting mechanism, to provide a broader range of choice between SLMs and LLMs. We instantiate COSMos with an LLM-based Fault Localisation technique, AutoFL, and report the cost-benefit trade-off between LLM accuracy and various costs such as energy consumption, inference time, and the number of tokens used. An empirical evaluation using Defects4J shows that COSMos can build effective ensembles that can achieve Pareto-optimality in terms of FL accuracy and inference cost, when compared to individual models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes