CR PL SE MLJul 16, 2012

MARFCAT: Transitioning to Binary and Larger Data Sets of SATE IV

Serguei A. Mokhov, Joey Paquet, Mourad Debbabi, Yankui Sun

arXiv:1207.3718v210 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for efficient and accurate vulnerability detection in code across multiple platforms, but it appears incremental as a follow-up iteration with preliminary results.

The authors tackled the problem of static code analysis for security and software engineering weaknesses by transitioning their MARFCAT approach to binary and larger datasets from SATE IV, achieving fast and accurate detection where other tools are slower or have lower recall.

We present a second iteration of a machine learning approach to static code analysis and fingerprinting for weaknesses related to security, software engineering, and others using the open-source MARF framework and the MARFCAT application based on it for the NIST's SATE IV static analysis tool exposition workshop's data sets that include additional test cases, including new large synthetic cases. To aid detection of weak or vulnerable code, including source or binary on different platforms the machine learning approach proved to be fast and accurate to for such tasks where other tools are either much slower or have much smaller recall of known vulnerabilities. We use signal and NLP processing techniques in our approach to accomplish the identification and classification tasks. MARFCAT's design from the beginning in 2010 made is independent of the language being analyzed, source code, bytecode, or binary. In this follow up work with explore some preliminary results in this area. We evaluated also additional algorithms that were used to process the data.

View on arXiv PDF

Similar