CLDec 11, 2024

Code LLMs: A Taxonomy-based Survey

arXiv:2412.08291v111 citationsh-index: 8BigData
Originality Synthesis-oriented
AI Analysis

This work provides a structured overview for researchers and practitioners in AI and software engineering, but it is incremental as it synthesizes existing knowledge without introducing new methods or data.

The authors tackled the problem of organizing and understanding the rapidly evolving field of large language models (LLMs) applied to coding tasks by conducting a taxonomy-based survey that categorizes concepts, methodologies, and applications, resulting in a unified classification framework.

Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks and have recently expanded their impact to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL). This taxonomy-based survey provides a comprehensive analysis of LLMs in the NL-PL domain, investigating how these models are utilized in coding tasks and examining their methodologies, architectures, and training processes. We propose a taxonomy-based framework that categorizes relevant concepts, providing a unified classification system to facilitate a deeper understanding of this rapidly evolving field. This survey offers insights into the current state and future directions of LLMs in coding tasks, including their applications and limitations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes