CRIRSEOct 16, 2017

Classifying Web Exploits with Topic Modeling

arXiv:1710.05561v117 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for semi-automatic classification of exploits in vulnerability tracking infrastructures, representing an incremental improvement in software security research.

The paper tackled the problem of classifying web and proof-of-concept exploits for software vulnerabilities using topic modeling and database metadata, achieving an accuracy rate near 0.9 on a dataset of over 36,000 exploits.

This short empirical paper investigates how well topic modeling and database meta-data characteristics can classify web and other proof-of-concept (PoC) exploits for publicly disclosed software vulnerabilities. By using a dataset comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is obtained in the empirical experiment. Text mining and topic modeling are a significant boost factor behind this classification performance. In addition to these empirical results, the paper contributes to the research tradition of enhancing software vulnerability information with text mining, providing also a few scholarly observations about the potential for semi-automatic classification of exploits in the existing tracking infrastructures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes