Masarah Paquet-Clouston

CR
6papers
315citations
Novelty25%
AI Score36

6 Papers

CRAug 6, 2024
The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums

Vanessa Clairoux-Trepanier, Isa-May Beauchamp, Estelle Ruellan et al.

Large language models (LLMs) can be used to analyze cyber threat intelligence (CTI) data from cybercrime forums, which contain extensive information and key discussions about emerging cyber threats. However, to date, the level of accuracy and efficiency of LLMs for such critical tasks has yet to be thoroughly evaluated. Hence, this study assesses the performance of an LLM system built on the OpenAI GPT-3.5-turbo model [8] to extract CTI information. To do so, a random sample of more than 700 daily conversations from three cybercrime forums - XSS, Exploit_in, and RAMP - was extracted, and the LLM system was instructed to summarize the conversations and predict 10 key CTI variables, such as whether a large organization and/or a critical infrastructure is being targeted, with only simple human-language instructions. Then, two coders reviewed each conversation and evaluated whether the information extracted by the LLM was accurate. The LLM system performed well, with an average accuracy score of 96.23%, an average precision of 90% and an average recall of 88.2%. Various ways to enhance the model were uncovered, such as the need to help the LLM distinguish between stories and past events, as well as being careful with verb tenses in prompts. Nevertheless, the results of this study highlight the relevance of using LLMs for cyber threat intelligence.

CRAug 30, 2023
Conti Inc.: Understanding the Internal Discussions of a large Ransomware-as-a-Service Operator with Machine Learning

Estelle Ruellan, Masarah Paquet-Clouston, Sebastian Garcia

Ransomware-as-a-service (RaaS) is increasing the scale and complexity of ransomware attacks. Understanding the internal operations behind RaaS has been a challenge due to the illegality of such activities. The recent chat leak of the Conti RaaS operator, one of the most infamous ransomware operators on the international scene, offers a key opportunity to better understand the inner workings of such organizations. This paper analyzes the main topic discussions in the Conti chat leak using machine learning techniques such as Natural Language Processing (NLP) and Latent Dirichlet Allocation (LDA), as well as visualization strategies. Five discussion topics are found: 1) Business, 2) Technical, 3) Internal tasking/Management, 4) Malware, and 5) Customer Service/Problem Solving. Moreover, the distribution of topics among Conti members shows that only 4% of individuals have specialized discussions while almost all individuals (96%) are all-rounders, meaning that their discussions revolve around the five topics. The results also indicate that a significant proportion of Conti discussions are non-tech related. This study thus highlights that running such large RaaS operations requires a workforce skilled beyond technical abilities, with individuals involved in various tasks, from management to customer service or problem solving. The discussion topics also show that the organization behind the Conti RaaS oper5086933ator shares similarities with a large firm. We conclude that, although RaaS represents an example of specialization in the cybercrime industry, only a few members are specialized in one topic, while the rest runs and coordinates the RaaS operation.

CYMar 2
Assessing Crime Disclosure Patterns in a Large-Scale Cybercrime Forum

Raphael Hoheisel, Tom Meurs, Jai Wientjes et al.

Cybercrime forums play a central role in the cybercrime ecosystem, serving as hubs for the exchange of illicit goods, services, and knowledge. Previous studies have explored the market and social structures of these forums, but less is known about the behavioral dynamics of users, particularly regarding participants' disclosure of criminal activity. This study provides the first large-scale assessment of crime disclosure patterns in a major cybercrime forum, analysing over 3.5 million posts from nearly 300k users. Using a three-level classification scheme (benign, grey, and crime) and a scalable labelling pipeline powered by large language models (LLMs), we measure the level of crime disclosure present in initial posts, analyse how participants switch between levels, and assess how crime disclosure behavior relates to private communications. Our results show that crime disclosure is relatively normative: one quarter of initial posts include explicit crime-related content, and more than one third of users disclose criminal activity at least once in their initial posts. At the same time, most participants show restraint, with over two-thirds posting only benign or grey content and typically escalating disclosure gradually. Grey initial posts are particularly prominent, indicating that many users avoid overt statements and instead anchor their activity in ambiguous content. The study highlights the value of LLM-based text classification and Markov chain modelling for capturing crime disclosure patterns, offering insights for law enforcement efforts aimed at distinguishing benign, grey, and criminal content in cybercrime forums.

CRApr 11, 2018Code
Ransomware Payments in the Bitcoin Ecosystem

Masarah Paquet-Clouston, Bernhard Haslhofer, Benoit Dupont

Ransomware can prevent a user from accessing a device and its files until a ransom is paid to the attacker, most frequently in Bitcoin. With over 500 known ransomware families, it has become one of the dominant cybercrime threats for law enforcement, security professionals and the public. However, a more comprehensive, evidence-based picture on the global direct financial impact of ransomware attacks is still missing. In this paper, we present a data-driven method for identifying and gathering information on Bitcoin transactions related to illicit activity based on footprints left on the public Bitcoin blockchain. We implement this method on-top-of the GraphSense open-source platform and apply it to empirically analyze transactions related to 35 ransomware families. We estimate the lower bound direct financial impact of each ransomware family and find that, from 2013 to mid-2017, the market for ransomware payments has a minimum worth of USD 12,768,536 (22,967.54 BTC). We also find that the market is highly skewed with only a few number of players responsible for the majority of the payments. Based on these research findings, policy-makers and law enforcement agencies can use the statistics provided to understand the size of the illicit market and make informed decisions on how best to address the threat.

CYFeb 3, 2022
Entanglement: Cybercrime Connections of an Internet Marketing Forum Population

Masarah Paquet-Clouston, Serge-Olivier Paquette, Sebastián García et al.

Many activities related to cybercrime operations do not require much secrecy, such as developing websites or translating texts. This research provides indications that many users of a popular public internet marketing forum have connections to cybercrime. It does so by investigating the involvement in cybercrime of a population of users interested in internet marketing, both at a micro and macro scale. The research starts with a case study of three users confirmed to be involved in cybercrime and their use of the public forum where users share information about online advertising. It provides a first glimpse that some business with cybercrime connection is being conducted in the clear. The study then pans out to investigate the forum population's ties with cybercrime by finding crossover users, who are users from the public forum who also comment on cybercrime forums. The cybercrime forums on which they discuss are analyzed and crossover users' strength of participation is reported. Also, to assess if they represent a sub-group of the forum population, their posting behavior on the public forum is compared with that of non-crossover users. This blend of analyses shows that (i) a minimum of 7.2% of the public forum population are crossover users that have ties with cybercrime forums; (ii) their participation in cybercrime forums is limited; and (iii) their posting behavior is relatively indistinguishable from that of non-crossover users. This is the first study to formally quantify how users of an internet marketing public forum, a space for informal exchanges, have ties to cybercrime activities. We conclude that crossover users are a substantial part of the population in the public forum, and, even though they have thus far been overlooked, their aggregated effect in the ecosystem must be considered.

CRAug 2, 2019
Spams meet Cryptocurrencies: Sextortion in the Bitcoin Ecosystem

Masarah Paquet-Clouston, Matteo Romiti, Bernhard Haslhofer et al.

In the past year, a new spamming scheme has emerged: sexual extortion messages requiring payments in the cryptocurrency Bitcoin, also known as sextortion. This scheme represents a first integration of the use of cryptocurrencies by members of the spamming industry. Using a dataset of 4,340,736 sextortion spams, this research aims at understanding such new amalgamation by uncovering spammers' operations. To do so, a simple, yet effective method for projecting Bitcoin addresses mentioned in sextortion spams onto transaction graph abstractions is computed over the entire Bitcoin blockchain. This allows us to track and investigate monetary flows between involved actors and gain insights into the financial structure of sextortion campaigns. We find that sextortion spammers are somewhat sophisticated, following pricing strategies and benefiting from cost reductions as their operations cut the upper-tail of the spamming supply chain. We discover that one single entity is likely controlling the financial backbone of the majority of the sextortion campaigns and that the 11-month operation studied yielded a lower-bound revenue between \$1,300,620 and \$1,352,266. We conclude that sextortion spamming is a lucrative business and spammers will likely continue to send bulk emails that try to extort money through cryptocurrencies.