SEAIJan 23

Revisiting the Role of Natural Language Code Comments in Code Translation

arXiv:2601.16661v1h-index: 22
Originality Incremental advance
AI Analysis

This addresses the problem of improving code translation accuracy for developers and researchers, but it is incremental as it builds on existing LLM methods by incorporating comments.

The paper tackles the problem of automated code translation across programming languages by investigating the impact of natural language code comments on translation quality, finding that comments, especially those describing overall code purpose, significantly enhance accuracy and can potentially double LLM-based translation performance.

The advent of large language models (LLMs) has ushered in a new era in automated code translation across programming languages. Since most code-specific LLMs are pretrained on well-commented code from large repositories like GitHub, it is reasonable to hypothesize that natural language code comments could aid in improving translation quality. Despite their potential relevance, comments are largely absent from existing code translation benchmarks, rendering their impact on translation quality inadequately characterised. In this paper, we present a large-scale empirical study evaluating the impact of comments on translation performance. Our analysis involves more than $80,000$ translations, with and without comments, of $1100+$ code samples from two distinct benchmarks covering pairwise translations between five different programming languages: C, C++, Go, Java, and Python. Our results provide strong evidence that code comments, particularly those that describe the overall purpose of the code rather than line-by-line functionality, significantly enhance translation accuracy. Based on these findings, we propose COMMENTRA, a code translation approach, and demonstrate that it can potentially double the performance of LLM-based code translation. To the best of our knowledge, our study is the first in terms of its comprehensiveness, scale, and language coverage on how to improve code translation accuracy using code comments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes