SE CR LGMar 16, 2022

On the Use of Fine-grained Vulnerable Code Statements for Software Vulnerability Assessment Models

arXiv:2203.08417v111.736 citationsh-index: 55Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for data-driven vulnerability assessment to prioritize fixes in software security, though it is incremental as it builds on existing detection methods.

The paper tackled the problem of automating software vulnerability assessment by predicting CVSS metrics, showing that using vulnerable code statements as input yields 7.5-114.5% stronger performance than non-vulnerable statements, with further gains from context incorporation.

Many studies have developed Machine Learning (ML) approaches to detect Software Vulnerabilities (SVs) in functions and fine-grained code statements that cause such SVs. However, there is little work on leveraging such detection outputs for data-driven SV assessment to give information about exploitability, impact, and severity of SVs. The information is important to understand SVs and prioritize their fixing. Using large-scale data from 1,782 functions of 429 SVs in 200 real-world projects, we investigate ML models for automating function-level SV assessment tasks, i.e., predicting seven Common Vulnerability Scoring System (CVSS) metrics. We particularly study the value and use of vulnerable statements as inputs for developing the assessment models because SVs in functions are originated in these statements. We show that vulnerable statements are 5.8 times smaller in size, yet exhibit 7.5-114.5% stronger assessment performance (Matthews Correlation Coefficient (MCC)) than non-vulnerable statements. Incorporating context of vulnerable statements further increases the performance by up to 8.9% (0.64 MCC and 0.75 F1-Score). Overall, we provide the initial yet promising ML-based baselines for function-level SV assessment, paving the way for further research in this direction.

View on arXiv PDF Code

Similar