Analysis of AdvFusion: Adapter-based Multilingual Learning for Code Large Language Models
This work addresses the challenge of efficiently adapting Code-LLMs to multiple programming languages and tasks, but it is incremental as it builds on prior AdvFusion research by testing on new tasks and models with mixed results.
The study tackled the problem of multilingual knowledge transfer for Code Large Language Models by evaluating AdvFusion, a Parameter Efficient Fine-Tuning method, across new tasks like code generation, code translation, and commit message generation, finding that AdvFusion outperformed AdapterFusion in code generation but performed worse in code translation and commit message generation, with performance gaps varying by model size and task.
Programming languages can benefit from one another by utilizing a language model for software engineering tasks. Full fine-tuning and Parameter Efficient Fine-Tuning (PEFT) of Code Language Models (Code-LMs) has been explored for multilingual knowledge transfer. AdapterFusion is a PEFT architecture that aims to enhance task performance by leveraging information from multiple programming languages, but primarily focuses on the target programming language. In our previous work, we proposed AdvFusion, a novel PEFT-based approach that effectively learns from other programming languages before adapting to the target task. Though previous experiments showed that AdvFusion outperformed AdapterFusion and LoRA, it was applied on pre-trained Code-LMs and was limited to only two tasks, code summarization and method name prediction. In this study, we expanded our work and investigated AdvFusion on Code Large Language Models (Code-LLMs), considering three new tasks: code generation, code translation, and commit message generation. We observed that different Code-LLMs/tasks exhibit different characteristics. In code generation, AdvFusion outperformed AdapterFusion but not other PEFT methods (LoRA, Compacter, and TaskAdapter). In commit message generation, AdapterFusion performed better than AdvFusion, and contrary to code generation, we found that the other PEFT methods do not have better performance. In code translation, AdvFusion performed worse than AdapterFusion overall, with the performance gap marginally widening as the model size increases. However, consistent with code generation, other PEFT methods showed better performance.