SE LGApr 16

Prompt-Driven Code Summarization: A Systematic Literature Review

Afia Farjana, Zaiyu Cheng, Antonio Mastropaolo

arXiv:2604.1538561.8h-index: 14

Predicted impact top 35% in SE · last 90 daysOriginality Synthesis-oriented

AI Analysis

For researchers and practitioners in software engineering, this review provides a structured overview of prompting strategies for code summarization, highlighting the need for standardized evaluation.

This systematic literature review consolidates evidence on prompt-driven code summarization using LLMs, categorizing prompting paradigms and identifying gaps in current research due to fragmented studies and inconsistent evaluation metrics.

Software documentation is essential for program comprehension, developer onboarding, code review, and long-term maintenance. Yet producing quality documentation manually is time-consuming and frequently yields incomplete or inconsistent results. Large language models (LLMs) offer a promising solution by automatically generating natural language descriptions from source code, helping developers understand code more efficiently, facilitating maintenance, and supporting downstream activities such as defect localization and commit message generation. However, the effectiveness of LLMs in documentation tasks critically depends on how they are prompted. Properly structured instructions can substantially improve model performance, making prompt engineering-the design of input prompts to guide model behavior-a foundational technique in LLM-based software engineering. Approaches such as few-shot prompting, chain-of-thought reasoning, retrieval-augmented generation, and zero-shot learning show promise for code summarization, yet current research remains fragmented. There is limited understanding of which prompting strategies work best, for which models, and under what conditions. Moreover, evaluation practices vary widely, with most studies relying on overlap-based metrics that may not capture semantic quality. This systematic literature review consolidates existing evidence, categorizes prompting paradigms, examines their effectiveness, and identifies gaps to guide future research and practical adoption.

View on arXiv PDF

Similar