CRSEMar 31

When Labels Are Scarce: A Systematic Mapping of Label-Efficient Code Vulnerability Detection

arXiv:2604.0007960.2
Predicted impact top 31% in CR · last 90 daysOriginality Synthesis-oriented
AI Analysis

It addresses the challenge of reducing reliance on human labeling for code vulnerability detection, which is incremental as it consolidates existing approaches rather than introducing new methods.

This survey tackles the problem of expensive and noisy labeling in machine-learning-based code vulnerability detection by mapping label-efficient approaches, synthesizing five paradigm families and their mechanisms, and providing a design map and decision guide for practical method selection.

Machine-learning-based code vulnerability detection (CVD) has progressed rapidly, from deep program representations to pretrained code models and LLM-centered pipelines. Yet dependable vulnerability labeling remains expensive, noisy, and uneven across projects, languages, and CWE types, motivating approaches that reduce reliance on human labeling. This survey maps these approaches, synthesizing five paradigm families and the mechanisms they use. It connects mechanisms to token, graph, hybrid, and knowledgebased representations, and consolidates evaluation and reporting axes that limit comparison (label-budget specification, compute/cost assumptions, leakage, and granularity mismatches). A Design Map and constraintfirst Decision Guide distill trade-offs and failure modes for practical method selection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes