CR AI LG PL SEJun 1, 2021

On using distributed representations of source code for the detection of C security vulnerabilities

David Coimbra, Sofia Reis, Rui Abreu, Corina Păsăreanu, Hakan Erdogmus

arXiv:2106.01367v117.019 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of automated vulnerability detection for C code, but it is incremental as it applies an existing model to a specific domain with modest improvements.

The paper tackled detecting security vulnerabilities in C source code using the Code2vec model, achieving an accuracy of 61.43% on the CodeXGLUE benchmark, which is comparable to transformer-based methods like RoBERTa and outperforms naive NLP approaches.

This paper presents an evaluation of the code representation model Code2vec when trained on the task of detecting security vulnerabilities in C source code. We leverage the open-source library astminer to extract path-contexts from the abstract syntax trees of a corpus of labeled C functions. Code2vec is trained on the resulting path-contexts with the task of classifying a function as vulnerable or non-vulnerable. Using the CodeXGLUE benchmark, we show that the accuracy of Code2vec for this task is comparable to simple transformer-based methods such as pre-trained RoBERTa, and outperforms more naive NLP-based methods. We achieved an accuracy of 61.43% while maintaining low computational requirements relative to larger models.

View on arXiv PDF Code

Similar