Code LLMs: A Taxonomy-based Survey
This work provides a structured overview for researchers and practitioners in AI and software engineering, but it is incremental as it synthesizes existing knowledge without introducing new methods or data.
The authors tackled the problem of organizing and understanding the rapidly evolving field of large language models (LLMs) applied to coding tasks by conducting a taxonomy-based survey that categorizes concepts, methodologies, and applications, resulting in a unified classification framework.
Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks and have recently expanded their impact to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL). This taxonomy-based survey provides a comprehensive analysis of LLMs in the NL-PL domain, investigating how these models are utilized in coding tasks and examining their methodologies, architectures, and training processes. We propose a taxonomy-based framework that categorizes relevant concepts, providing a unified classification system to facilitate a deeper understanding of this rapidly evolving field. This survey offers insights into the current state and future directions of LLMs in coding tasks, including their applications and limitations.