When Labels Are Scarce: A Systematic Mapping of Label-Efficient Code Vulnerability Detection
It addresses the challenge of reducing reliance on human labeling for code vulnerability detection, which is incremental as it consolidates existing approaches rather than introducing new methods.
This survey tackles the problem of expensive and noisy labeling in machine-learning-based code vulnerability detection by mapping label-efficient approaches, synthesizing five paradigm families and their mechanisms, and providing a design map and decision guide for practical method selection.
Machine-learning-based code vulnerability detection (CVD) has progressed rapidly, from deep program representations to pretrained code models and LLM-centered pipelines. Yet dependable vulnerability labeling remains expensive, noisy, and uneven across projects, languages, and CWE types, motivating approaches that reduce reliance on human labeling. This survey maps these approaches, synthesizing five paradigm families and the mechanisms they use. It connects mechanisms to token, graph, hybrid, and knowledgebased representations, and consolidates evaluation and reporting axes that limit comparison (label-budget specification, compute/cost assumptions, leakage, and granularity mismatches). A Design Map and constraintfirst Decision Guide distill trade-offs and failure modes for practical method selection.