CRAILGJan 18, 2021

Leveraging AI to optimize website structure discovery during Penetration Testing

arXiv:2101.07223v19 citations
Originality Incremental advance
AI Analysis

This addresses efficiency in web security testing for penetration testers, though it appears incremental as it builds on existing dirbusting methods.

The paper tackles the problem of time-consuming directory brute-forcing (dirbusting) in web penetration testing by using semantic clustering to organize wordlists, resulting in performance improvements of up to 50% across eight tested web applications.

Dirbusting is a technique used to brute force directories and file names on web servers while monitoring HTTP responses, in order to enumerate server contents. Such a technique uses lists of common words to discover the hidden structure of the target website. Dirbusting typically relies on response codes as discovery conditions to find new pages. It is widely used in web application penetration testing, an activity that allows companies to detect websites vulnerabilities. Dirbusting techniques are both time and resource consuming and innovative approaches have never been explored in this field. We hence propose an advanced technique to optimize the dirbusting process by leveraging Artificial Intelligence. More specifically, we use semantic clustering techniques in order to organize wordlist items in different groups according to their semantic meaning. The created clusters are used in an ad-hoc implemented next-word intelligent strategy. This paper demonstrates that the usage of clustering techniques outperforms the commonly used brute force methods. Performance is evaluated by testing eight different web applications. Results show a performance increase that is up to 50% for each of the conducted experiments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes