SEAIAug 10, 2022

Multi-View Pre-Trained Model for Code Vulnerability Identification

arXiv:2208.05227v13 citationsh-index: 18
Originality Highly original
AI Analysis

This addresses the need for automated vulnerability identification in software security, offering a novel method that improves upon existing pre-trained models.

The paper tackles the problem of identifying vulnerabilities in source code by proposing a Multi-View Pre-Trained Model (MV-PTM) that encodes sequential and structural information, resulting in an average F1 score improvement of 3.36% over GraphCodeBERT on two public datasets.

Vulnerability identification is crucial for cyber security in the software-related industry. Early identification methods require significant manual efforts in crafting features or annotating vulnerable code. Although the recent pre-trained models alleviate this issue, they overlook the multiple rich structural information contained in the code itself. In this paper, we propose a novel Multi-View Pre-Trained Model (MV-PTM) that encodes both sequential and multi-type structural information of the source code and uses contrastive learning to enhance code representations. The experiments conducted on two public datasets demonstrate the superiority of MV-PTM. In particular, MV-PTM improves GraphCodeBERT by 3.36\% on average in terms of F1 score.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes