CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
This benchmark dataset addresses the need for standardized evaluation in machine learning for programming language tasks, benefiting researchers developing new methods in this domain.
This paper introduces CodeXGLUE, a new benchmark dataset for machine learning research in program understanding and generation. It comprises 10 tasks across 14 datasets and provides a platform for model evaluation, along with three baseline systems (BERT-style, GPT-style, and Encoder-Decoder models).
Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.