SE AINov 3, 2025

Exploringand Unleashing the Power of Large Language Models in CI/CD Configuration Translation

Chong Wang, Chen Zhang, Jiajun Wu, Wunan Guo, Jianfeng Qu, Yewen Tian, Yang Liu

arXiv:2511.01316v13.4h-index: 2

Originality Incremental advance

AI Analysis

This addresses the challenge of CI platform migration for software developers, but it is incremental as it builds on existing LLM capabilities for a specific domain task.

The study tackled the problem of translating CI configurations between platforms, specifically from Travis CI to GitHub Actions, by evaluating large language models (LLMs) and found that combining guideline-based prompting with iterative refinement improved the Build Success Rate to 75.5%, a nearly threefold increase over basic prompting.

Continuous Integration (CI) is a cornerstone of modern collaborative software development, and numerous CI platforms are available. Differences in maintenance overhead, reliability, and integration depth with code-hosting platforms make migration between CI platforms a common practice. A central step in migration is translating CI configurations, which is challenging due to the intrinsic complexity of CI configurations and the need to understand semantic differences and relationships across CI platforms. With the advent of large language models (LLMs), recent advances in software engineering highlight their potential for CI configuration translation. In this paper, we present a study on LLM-based CI configuration translation, focusing on the migration from Travis CI to GitHub Actions. First, using 811 migration records, we quantify the effort involved and find that developers read an average of 38 lines of Travis configuration and write 58 lines of GitHub Actions configuration, with nearly half of the migrations requiring multiple commits. We further analyze translations produced by each of the four LLMs and identify 1,121 issues grouped into four categories: logic inconsistencies (38%), platform discrepancies (32%), environment errors (25%), and syntax errors (5%). Finally, we evaluate three enhancement strategies and show that combining guideline-based prompting with iterative refinement achieves the best performance, reaching a Build Success Rate of 75.5%-nearly a threefold improvement over GPT-4o with a basic prompt.

View on arXiv PDF

Similar