Separating Secrets from Placeholders: A Hybrid CNN-CodeBERT Framework for Three-Class Credential Leakage Detection

Maksuda Bilkis Baby, Khushika Shah, Naiyue Liang, Lei Zhang

arXiv:2605.315202.8

Predicted impact top 43% in SE · last 90 daysOriginality Highly original

AI Analysis

This work provides a more accurate and efficient method for detecting genuine credential leaks in public source code repositories, which is crucial for developers and organizations to mitigate security risks.

This paper addresses the problem of high false-positive rates in credential leakage detection by introducing a three-class classification framework that distinguishes genuine credentials from placeholders or weak credentials. The proposed model achieves a Matthews Correlation Coefficient of 0.86 and a macro F1-score of 0.90, reducing high-severity alerts by 33.0% while maintaining 93% recall and 89% precision for genuine leaks.

Credential leakage in public source code repositories poses a critical security threat, with over 23.8 million secrets exposed in 2024 alone. Existing detection tools suffer from high false-positive rates because rigid pattern matching and binary classification schemes fail to distinguish genuine credentials from placeholder or weak credentials. We propose a three-class classification framework that explicitly models placeholder or weak credentials as a distinct class, leveraging CodeBERT-based semantic understanding combined with character-level pattern recognition. We evaluate our approach on a newly constructed dataset of 9,426 samples spanning 10 programming languages. Our model achieves a Matthews Correlation Coefficient of 0.86 and a macro F1-score of 0.90, achieving 93% recall and 89% precision for genuine credential leaks while reducing high severity alerts by 33.0% (from 373 to 250) without sacrificing security coverage. Compared to prior character-level approaches, our method improves placeholder or weak credential detection from 54% to 81% F1-score while maintaining strong cross language generalization, with 9 of 10 languages achieving F1 above 0.80 under leave-one-language-out evaluation.

View on arXiv PDF

Similar