SE AIJan 7

Bootstrapping Code Translation with Weighted Multilanguage Exploration

Yuhan Wu, Huan Zhang, Wei Cheng, Chen Shen, Jingyue Yang, Wei Hu

arXiv:2601.03512v1h-index: 3

Originality Incremental advance

AI Analysis

This work addresses code translation challenges for developers and researchers, offering a novel method but with incremental improvements over existing approaches.

The paper tackled the problem of code translation across multiple programming languages by addressing data scarcity and optimization imbalance, resulting in substantial improvements over baseline LLMs on benchmarks like HumanEval-X and TransCoder-Test.

Code translation across multiple programming languages is essential yet challenging due to two vital obstacles: scarcity of parallel data paired with executable test oracles, and optimization imbalance when handling diverse language pairs. We propose BootTrans, a bootstrapping method that resolves both obstacles. Its key idea is to leverage the functional invariance and cross-lingual portability of test suites, adapting abundant pivot-language unit tests to serve as universal verification oracles for multilingual RL training. Our method introduces a dual-pool architecture with seed and exploration pools to progressively expand training data via execution-guided experience collection. Furthermore, we design a language-aware weighting mechanism that dynamically prioritizes harder translation directions based on relative performance across sibling languages, mitigating optimization imbalance. Extensive experiments on the HumanEval-X and TransCoder-Test benchmarks demonstrate substantial improvements over baseline LLMs across all translation directions, with ablations validating the effectiveness of both bootstrapping and weighting components.

View on arXiv PDF

Similar