31.1SEMay 10
Guidelines for Empirical Studies in Software Engineering involving Large Language ModelsSebastian Baltes, Florian Angermeir, Chetan Arora et al.
Large Language Models (LLMs) are widely used in software engineering (SE) research and practice, yet their non-determinism, opaque training data, and rapidly evolving models threaten the reproducibility and replicability of empirical studies. We address this challenge through a collaborative effort of 22 researchers, presenting a taxonomy of seven study types that organizes how LLMs are used in SE research, together with eight guidelines for designing and reporting such studies. Each guideline distinguishes requirements (must) from recommended practices (should) and is contextualized by the study types it applies to. Our guidelines recommend that researchers: (1) declare LLM usage and role; (2) report model versions, configurations, and customizations; (3) document the tool architecture beyond the model; (4) disclose prompts, their development, and interaction logs; (5) validate LLM outputs with humans; (6) include an open LLM as a baseline; (7) use suitable baselines, benchmarks, and metrics; and (8) articulate limitations and mitigations. We complement the guidelines with an applicability matrix mapping guidelines to study types and a reporting checklist for authors and reviewers. We maintain the study types and guidelines online as a living resource for the community to use and shape (llm-guidelines$.$org).
SEDec 23, 2021
Towards Fully Declarative Program Analysis via Source Code TransformationRijnard van Tonder
Advances in logic programming and increasing industrial uptake of Datalog-inspired approaches demonstrate the emerging need to express powerful code analyses more easily. Declarative program analysis frameworks (e.g., using logic programming like Datalog) significantly ease defining analyses compared to imperative implementations. However, the declarative benefits of these frameworks only materialize after parsing and translating source code to generate facts. Fact generation remains a non-declarative precursor to analysis where imperative implementations first parse and interpret program structures (e.g., abstract syntax trees and control-flow graphs). The procedure of fact generation thus remains opaque and difficult for non-experts to understand or modify. We present a new perspective on this analysis workflow by proposing declarative fact generation to ease specification and exploration of lightweight declarative analyses. Our approach demonstrates the first venture towards fully declarative analysis specification across multiple languages. The key idea is to translate source code directly to Datalog facts in the analysis domain using declarative syntax transformation. We then reuse existing Datalog analyses over generated facts, yielding an end-to-end declarative pipeline. As a first approximation we pursue a syntax-driven approach and demonstrate the feasibility of generating and using lightweight versions of liveness and call graph reachability properties. We then discuss the workability of extending declarative fact generation to also incorporate semantic information.