Automated generation of web server fingerprints
This enables more accurate study of web server distribution for security or infrastructure analysis, though it is incremental as it builds on existing fingerprinting methods.
The paper tackled the problem of identifying web server types without relying on version strings by using multifactor Bayesian inference on server response codes from 110,000 live servers, achieving successful prediction independently of version strings.
In this paper, we demonstrate that it is possible to automatically generate fingerprints for various web server types using multifactor Bayesian inference on randomly selected servers on the Internet, without building an a priori catalog of server features or behaviors. This makes it possible to conclusively study web server distribution without relying on reported (and variable) version strings. We gather data by sending a collection of specialized requests to 110,000 live web servers. Using only the server response codes, we then train an algorithm to successfully predict server types independently of the server version string. In the process, we note several distinguishing features of current web infrastructure.