SEAIIRLGOct 10, 2025

LLM Based Long Code Translation using Identifier Replacement

arXiv:2510.09045v2h-index: 2
Originality Incremental advance
AI Analysis

This addresses a bottleneck in automated software development tools for developers working with large codebases, though it appears incremental.

The paper tackles the problem of LLMs struggling with long code translation due to context window limitations by proposing a zero-shot method using identifier replacement. Their approach reduces token count and memory usage while preserving syntactical and hierarchical information in translations.

In the domain of software development, LLMs have been utilized to automate tasks such as code translation, where source code from one programming language is translated to another while preserving its functionality. However, LLMs often struggle with long source codes that don't fit into the context window, which produces inaccurate translations. To address this, we propose a novel zero-shot code translation method that incorporates identifier replacement. By substituting user-given long identifiers with generalized placeholders during translation, our method allows the LLM to focus on the logical structure of the code, by reducing token count and memory usage, which improves the efficiency and cost-effectiveness of long code translation. Our empirical results demonstrate that our approach preserves syntactical and hierarchical information and produces translation results with reduced tokens.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes