CLApr 2, 2024

Towards Better Understanding of Cybercrime: The Role of Fine-Tuned LLMs in Translation

Veronica Valeros, Anna Širokova, Carlos Catania, Sebastian Garcia

arXiv:2404.01940v11.02 citationsh-index: 112024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)

Originality Synthesis-oriented

AI Analysis

This addresses the need for efficient and accurate translation in cybersecurity to improve defense against cybercrime, though it is incremental as it applies an existing fine-tuning method to a specific domain.

The paper tackled the problem of translating cybercrime communications by fine-tuning a Large Language Model (LLM) on public chats from a Russian-speaking hacktivist group, resulting in translations that were better, faster, more accurate, and reduced costs by a factor of 430 to 23,000 compared to human translation.

Understanding cybercrime communications is paramount for cybersecurity defence. This often involves translating communications into English for processing, interpreting, and generating timely intelligence. The problem is that translation is hard. Human translation is slow, expensive, and scarce. Machine translation is inaccurate and biased. We propose using fine-tuned Large Language Models (LLM) to generate translations that can accurately capture the nuances of cybercrime language. We apply our technique to public chats from the NoName057(16) Russian-speaking hacktivist group. Our results show that our fine-tuned LLM model is better, faster, more accurate, and able to capture nuances of the language. Our method shows it is possible to achieve high-fidelity translations and significantly reduce costs by a factor ranging from 430 to 23,000 compared to a human translator.

View on arXiv PDF

Similar