5.0LGApr 10
Identification and Anonymization of Named Entities in Unstructured Information Sources for Use in Social Engineering DetectionCarlos Jimeno Miguel, Raul Orduna, Francesco Zola
This study addresses the challenge of creating datasets for cybercrime analysis while complying with the requirements of regulations such as the General Data Protection Regulation (GDPR) and Organic Law 10/1995 of the Penal Code. To this end, a system is proposed for collecting information from the Telegram platform, including text, audio, and images; the implementation of speech-to-text transcription models incorporating signal enhancement techniques; and the evaluation of different Named Entity Recognition (NER) solutions, including Microsoft Presidio and AI models designed using a transformer-based architecture. Experimental results indicate that Parakeet achieves the best performance in audio transcription, while the proposed NER solutions achieve the highest f1-score values in detecting sensitive information. In addition, anonymization metrics are presented that allow evaluation of the preservation of structural coherence in the data, while simultaneously guaranteeing the protection of personal information and supporting cybersecurity research within the current legal framework.
20.1CRMar 25
CTF as a Service: A reproducible and scalable infrastructure for cybersecurity trainingCarlos Jimeno Miguel, Mikel Izal Azcarate
Capture The Flag (CTF) competitions have established themselves as a highly effective pedagogical tool in cybersecurity education, offering students hands-on experience in realistic attack and defense scenarios. However, organizing and hosting these events requires considerable infrastructure effort, which frequently limits their adoption in academic settings. This paper presents the design, iterative development, and evaluation of a CTF as a Service (CaaS) platform built on Proxmox virtualization, leveraging Infrastructure as Code (IaC) tools such as Terraform and Ansible, container orchestration via Docker Swarm, and load balancing with HAProxy. The system supports both a development-centered workflow, in which challenges are automatically deployed from a Git repository through a CI/CD pipeline, and a deployment-oriented workflow for ad-hoc infrastructure provisioning. The paper describes the design decisions made, the challenges encountered during development, and the solutions implemented to achieve session persistence, external routing, and challenge replicability. The platform is designed to evolve into a CTF hosting service with commercial potential, and future lines of work are outlined regarding automatic scaling, monitoring integration, and frontend standardization.