Open Dataset of Phishing and Tor Hidden Services Screen-captures
This provides a resource for security analysts working on visual correlation tasks, but it is incremental as it focuses on dataset availability rather than new methods.
The paper tackles the need for datasets to develop automatic classification tools for security analysts by introducing two open datasets of manually verified screen-captures from phishing and Tor hidden services websites, sourced from CIRCL's Open-Source tools.
Security analysts need to classify, search and correlate numerous images. Automatic classification tools improve the efficiency of such tasks. However, the main resources to develop these tools are datasets, which are introduced and provided by the present paper, for the specific cases of visual correlation of phishing and onion websites. CIRCL's Open-Source tools are the sources of these screenshots, which had been manually verified against personal information leaks. Usage examples of these datasets are proposed in the current paper. These researches directions are, however, not the main contribution of the paper. The main contribution is the availability of the two datasets.