SE LGFeb 6, 2025

Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks

Kyi Shin Khant, Hong Yi Lin, Patanamon Thongtanunam

arXiv:2502.03806v15.91 citationsh-index: 19MSR

Originality Synthesis-oriented

AI Analysis

This work addresses the optimization of training for code models in software engineering, but it is incremental as it builds on existing CL methods and yields negative results.

The study investigated whether curriculum learning (CL) using code length and cyclomatic complexity improves pre-trained code models on real-world software engineering tasks like code clone detection and summarization, but found that model performance saturated early and showed issues like catastrophic forgetting, with no significant gains reported.

Learning-based techniques, especially advanced pre-trained models for code have demonstrated capabilities in code understanding and generation, solving diverse software engineering (SE) tasks. Despite the promising results, current training approaches may not fully optimize model performance, as they typically involve learning from randomly shuffled training data. Recent work shows that Curriculum Learning (CL) can improve performance on code-related tasks through incremental learning based on the difficulty of synthetic code. Yet, the effectiveness of CL with conventional difficulty measures in SE tasks remains largely unexplored. In this study, we explore two conventional code metrics: code length and cyclomatic complexity to determine the difficulty levels. We investigate how the pre-trained code model (CodeT5) learns under CL, through the tasks of code clone detection and code summarization. Our empirical study on the CodeXGLUE benchmark showed contrasting results to prior studies, where the model exhibited signs of catastrophic forgetting and shortcut learning. Surprisingly, model performance saturates after only the first quartile of training, potentially indicating a limit in the model's representation capacity and/or the task's inherent difficulty. Future work should further explore various CL strategies with different code models across a wider range of SE tasks for a more holistic understanding.

View on arXiv PDF

Similar