Roy Ricaldi

2papers

2 Papers

40.6CRJun 3
TeleHunt: A Framework and Tool for Efficient Cybercriminal Community Discovery on Telegram

Roy Ricaldi, Victor Asanache, Luca Allodi

This paper presents TeleHunt, a framework and tool for evaluating the effectiveness of different strategies to discover cybercriminal communities on Telegram. TeleHunt employs a set of reference-driven snowballing strategies, integrating message-level classification, contextual filtering, and market-segment labeling. Using open- and dark-web seeds, we systematically evaluate how seed source, pointer type, and exploration strategy influence discovery outcomes in three dimensions: efficiency, accessibility, and rediscovery. Our work provides (i) a modular cybercrime content discovery pipeline, (ii) the first systematic comparison of Telegram discovery strategies with an empirical characterization of market-segment accessibility, and (iii) a labeled dataset of over 172 million messages from 6,022 Telegram communities.

34.0CRMay 14
Topical Shifts in the Dark Web: A Longitudinal Analysis of Content from the Cybercrime Ecosystem

Roy Ricaldi, Maximilian Schafer, Philipp Zech et al.

The dark web hosts a dynamic ecosystem of cybercrime forums and marketplaces that adapt to law enforcement pressure, technological change, and economic incentives. Prior research has extracted cyber threat intelligence from these platforms using static snapshots, with limited attention to how discussions evolve over time. In this study, we conduct a longitudinal analysis of 25,065 websites in the dark web using 11,403,638 HTML snapshots (approximately 1245.38 GB) collected over six years. We develop a longitudinal topic-modeling framework combining domain-specific embeddings, density-based clustering and temporal aggregation to measure topic prevalence and lifecycle at the website level. Our analysis identifies 55 thematic clusters. We find that approximately 75% of total discussion volume is concentrated in a small set of persistent core topics, while short-lived themes account for approximately 3% of activity. The median topic lifespan is 75 months, indicating gradual thematic evolution rather than abrupt replacement.