CLAIFeb 23, 2024

Executing Natural Language-Described Algorithms with Large Language Models: An Investigation

arXiv:2403.00795v282 citationsh-index: 29LREC
Originality Synthesis-oriented
AI Analysis

This addresses the problem of evaluating LLMs' code execution capabilities for researchers and practitioners, though it is incremental in assessing existing models on a new benchmark.

The paper investigated the ability of large language models (LLMs) to execute algorithms described in natural language, finding that models like GPT-4 can effectively do so as long as heavy numeric computation is not involved, based on testing 30 algorithms with 300 instances.

Executing computer programs described in natural language has long been a pursuit of computer science. With the advent of enhanced natural language understanding capabilities exhibited by large language models (LLMs), the path toward this goal has been illuminated. In this paper, we seek to examine the capacity of present-day LLMs to comprehend and execute algorithms outlined in natural language. We established an algorithm test set sourced from Introduction to Algorithm, a well-known textbook that contains many representative widely-used algorithms. To systematically assess LLMs' code execution abilities, we selected 30 algorithms, generated 300 random-sampled instances in total, and evaluated whether popular LLMs can understand and execute these algorithms. Our findings reveal that LLMs, notably GPT-4, can effectively execute programs described in natural language, as long as no heavy numeric computation is involved. We believe our findings contribute to evaluating LLMs' code execution abilities and would encourage further investigation and application for the computation power of LLMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes