SEAIJan 7

Bootstrapping Code Translation with Weighted Multilanguage Exploration

arXiv:2601.03512v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses code translation challenges for developers and researchers, offering a novel method but with incremental improvements over existing approaches.

The paper tackled the problem of code translation across multiple programming languages by addressing data scarcity and optimization imbalance, resulting in substantial improvements over baseline LLMs on benchmarks like HumanEval-X and TransCoder-Test.

Code translation across multiple programming languages is essential yet challenging due to two vital obstacles: scarcity of parallel data paired with executable test oracles, and optimization imbalance when handling diverse language pairs. We propose BootTrans, a bootstrapping method that resolves both obstacles. Its key idea is to leverage the functional invariance and cross-lingual portability of test suites, adapting abundant pivot-language unit tests to serve as universal verification oracles for multilingual RL training. Our method introduces a dual-pool architecture with seed and exploration pools to progressively expand training data via execution-guided experience collection. Furthermore, we design a language-aware weighting mechanism that dynamically prioritizes harder translation directions based on relative performance across sibling languages, mitigating optimization imbalance. Extensive experiments on the HumanEval-X and TransCoder-Test benchmarks demonstrate substantial improvements over baseline LLMs across all translation directions, with ablations validating the effectiveness of both bootstrapping and weighting components.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes