CRFeb 25, 2021

Data-Driven Characterization and Detection of COVID-19 Themed Malicious Websites

Mir Mehedi Ahsan Pritom, Kristin M. Schweitzer, Raymond M. Bateman, Min Xu, Shouhuai Xu

arXiv:2102.13226v18.813 citations

Originality Synthesis-oriented

AI Analysis

This addresses cybersecurity threats for remote workers during the pandemic, but it is incremental as it applies an existing method to a new domain.

The paper tackled the problem of detecting malicious websites exploiting COVID-19 themes, and found that a Random Forest classifier using lexical and WHOIS features achieved 98% accuracy with a 2.7% false-positive rate.

COVID-19 has hit hard on the global community, and organizations are working diligently to cope with the new norm of "work from home". However, the volume of remote work is unprecedented and creates opportunities for cyber attackers to penetrate home computers. Attackers have been leveraging websites with COVID-19 related names, dubbed COVID-19 themed malicious websites. These websites mostly contain false information, fake forms, fraudulent payments, scams, or malicious payloads to steal sensitive information or infect victims' computers. In this paper, we present a data-driven study on characterizing and detecting COVID-19 themed malicious websites. Our characterization study shows that attackers are agile and are deceptively crafty in designing geolocation targeted websites, often leveraging popular domain registrars and top-level domains. Our detection study shows that the Random Forest classifier can detect COVID-19 themed malicious websites based on the lexical and WHOIS features defined in this paper, achieving a 98% accuracy and 2.7% false-positive rate.

View on arXiv PDF

Similar