SEMay 2

Specification-Driven Code Translation Powered by Large Language Models: How Far Are We?

arXiv:2412.0459065.38 citationsh-index: 6
AI Analysis

This work clarifies the limited benefit of NL-specification for code translation, an incremental finding for researchers and practitioners using LLMs.

The paper investigates using natural language specification as an intermediate representation for code translation with LLMs, finding that it does not improve performance overall, though it helps for some language pairs (e.g., Python and C++ as source).

Large Language Models (LLMs) are increasingly being applied across various domains, including code-related tasks such as code translation. Previous studies have explored using LLMs for translating code between different programming languages. Since LLMs are more effective with natural language, using natural language as an intermediate representation in code translation tasks is an intuitively appealing approach. However, whether this benefit is general or highly context-dependent remains unclear. In this work, we investigate using NL-specification as an intermediate representation for code translation. We evaluate our method using three datasets, five popular programming languages, and 29 language pair permutations. Our results show that using NL-specification alone does not lead to performance improvements. However, when combined with source code, it provides gains in certain language pairs (notably with Python and C++ as source languages), while offering no consistent improvement overall. Besides analyzing the performance of code translation, we also investigate the quality of the translated code and provide insights into the issues present in the translated code.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes