SE AI CL LGMar 31, 2024

The Larger the Better? Improved LLM Code-Generation via Budget Reallocation

Michael Hassid, Tal Remez, Jonas Gehring, Roy Schwartz, Yossi Adi

arXiv:2404.00725v222.751 citationsh-index: 33Has Code

Originality Incremental advance

AI Analysis

This work addresses the efficiency and cost challenges in deploying large language models for code generation, offering a practical approach for resource-constrained scenarios, though it is incremental as it builds on existing model comparison and selection methods.

The study tackled the problem of whether larger language models are always better under fixed compute budgets by comparing code generation across model sizes, finding that repeated use of smaller models with unit-test selection can improve performance by up to 15% on five tasks, but ranking-based selection without unit-tests underperforms larger models.

It is a common belief that large language models (LLMs) are better than smaller-sized ones. However, larger models also require significantly more time and compute during inference. This begs the question: what happens when both models operate under the same budget? (e.g., compute, run-time). To address this question, we analyze code generation LLMs of various sizes and make comparisons such as running a 70B model once vs. generating five outputs from a 13B model. We consider a standard unit-test setup, which can be used to select the correct output from the smaller model. Our findings reveal that the repeated use of smaller models can yield consistent improvements, with gains of up to 15% across five tasks. On the other hand, in scenarios where unit-tests are unavailable, a ranking-based selection of candidates from the smaller model falls short of the performance of a single output from larger ones. Our results highlight the potential of using smaller models instead of larger ones, and the importance of studying approaches for ranking LLM outputs.

View on arXiv PDF Code

Similar