18.7NIMar 31
Shy Guys: A Light-Weight Approach to Detecting Robots on WebsitesRémi Van Boxem, Tom Barbette, Cristel Pelsser et al.
Automated bots now account for roughly half of all web requests, and an increasing number deliberately spoof their identity to either evade detection or to not respect robots.txt. Existing countermeasures are either resource-intensive (JavaScript challenges, CAPTCHAs), cost-prohibitive (commercial solutions), or degrade the user experience. This paper proposes a lightweight, passive approach to bot detection that combines user-agent string analysis with favicon-based heuristics, operating entirely on standard web server logs with no client-side interaction. We evaluate the method on over 4.6 million requests containing 54,945 unique user-agent strings collected from website hosted all around the earth. Our approach detects 67.7% of bot traffic while maintaining a false-positive rate of 3%, outperforming state of the art (less than 20%). This method can serve as a first line of defence, routing only genuinely ambiguous requests to active challenges and preserving the experience of legitimate users.
CRAug 6, 2019Code
A Public Network Trace of a Control and Automation SystemGorby Kabasele Ndonda, Ramin Sadre
The increasing number of attacks against automation systems such as SCADA and their network infrastructure have demonstrated that there is a need to secure those systems. Unfortunately, directly applying existing ICT security mechanisms to automation systems is hard due to constraints of the latter, such as availability requirements or limitations of the hardware. Thus, the solution privileged by researchers is the use of network-based intrusion detection systems (N-IDS). One of the issue that many researchers encounter is how to validate and evaluate their N-IDS. Having access to a real and large automation systems for experimentation is almost impossible as companies are not inclined to give access to their systems due to obvious concerns. The few public traffic datasets that could be used for off-line experiments are either synthetic or collected at small testbeds. In this paper, we will describe and characterize a public traffic dataset collected at the HVAC management system of a university campus. Although the dataset contains only packet headers, we believe that it can help researchers, in particular designers of flow-based IDS, to validate their solutions under more realistic conditions. The traces can be found on https://github.com/gkabasele/HVAC_Traces.