Víctor Mayoral-Vilches

RO
9papers
36citations
Novelty31%
AI Score43

9 Papers

CRMay 27Code
Cybersecurity AI (CAI) Dataset

Víctor Mayoral-Vilches

We present CAI Dataset, a fourteen-month corpus of cybersecurity LLM trajectories collected through the open-source CAI agent framework, built in response to PentestGPT's finding that expert operator trajectories, not base-model capability, are the bottleneck for cybersecurity LLM performance. CAI Dataset aggregates 230,935 session logs and 26,027,742 user prompts from 16,768 source IPs across 123 countries, exercising 4,187 unique LLM identifiers against 23,147 target domains over 18.07 TB of durable storage. The mix is hands-on (36.4% offensive, 20.1% attacker-intent, 27.5% business / integration, 4.4% defensive), making CAI Dataset, to the best of our knowledge, the largest described corpus of LLM-driven hacker trajectories. It is released to partner organisations and selected customers as an audience-size series (CAI Dataset10, CAI Dataset1k, CAI Dataset200k). Read longitudinally, the corpus is a record of cybersecurity itself turning automated: operators routinely paste live credentials, production hostnames and bearer tokens into prompts knowing their inputs are logged, a trade-off they accept to stay competitive. Aggregated across the industry, this concentrates a substantial fraction of the world's offensive and defensive operator context inside a handful of frontier-model API providers, a single failure surface whose breach or politically motivated repurposing could cascade into nation- and enterprise-scale disruption. The only configuration that preserves both the productivity advantage and operator-side confidentiality is an on-premise, privately-hosted cybersecurity-specialised LLM served inside the operator's trust boundary, which CAI Dataset is shaped to make practical.

CRMay 27
Towards Cybersecurity SuperIntelligence (CSI): What's the best harness for cybersecurity?

Víctor Mayoral-Vilches, Francesco Balassone, María Sanz-Gómez et al.

What is the best harness for cybersecurity AI? Cybersecurity systems are converging on a single execution scaffold per agent, an iterative shell loop driven by a Large Language Model (LLM). However, scaffolds are not interchangeable, rarely interoperable, and no single scaffold dominates across all challenge types. In our path towards researching Cybersecurity SuperIntelligence (CSI), we present a meta-scaffold that unifies heterogeneous agent harnesses under a common orchestration layer, enabling any LLM-driven scaffold to be deployed, benchmarked, and composed within the same infrastructure. Using CSI, we benchmark five scaffolds (CSI::Claude, CSI::Codex, CSI::GCAI, CSI::Mistral, CSI::CAI) on the 33 cybench challenges, holding the model fixed at alias2-mini. The best individual scaffolds solve 15/33 (45.5%); the four-scaffold union solves 17/33 (51.5%), with the fifth (CSI::Mistral, 10/33) contributing one exclusive solve. We find that no single scaffold is the best harness: it is the combination of structurally heterogeneous scaffolds that yields the highest coverage. We validate this through CSI's blackboard-based multi-agent architecture, in which scaffold-specialised agents run in parallel and exchange intermediate findings via a shared substrate (a blackboard). The blackboard solves 19/33 (57.6%), a 27% relative gain over CSI::Claude, one of the best individual scaffolds (15/33, 45.5%), 25% faster (20.2 h vs. 26.8 h), at comparable cost ($5,480 vs. $5,122).

CRApr 27
Dynamic Cyber Ranges

Víctor Mayoral-Vilches, María Sanz-Gómez, Francesco Balassone et al.

As LLM-driven agents advance in cybersecurity, Jeopardy CTF benchmarks are approaching saturation and cyber ranges, the natural next evaluation frontier, offer diminishing resistance under their current static design. We validate this observation by deploying an LLM-driven Advanced Persistent Threat (APT) agent across three tiers of increasingly realistic infrastructure (PRO Labs, MHBench, military-grade CYBER RANGES). To counteract this trend, we propose Dynamic Cyber Ranges: cyber range environments augmented with LLM-driven Defender agents that harden infrastructure, monitor for intrusions, and respond in real time. Across evaluated scenarios, Defender agents reduce attacker success to 0-55%, achieving complete prevention on multiple configurations. Since attacker and defender agents draw from the same underlying model capabilities, Dynamic Cyber Ranges preserve evaluation headroom as models improve. Notably, a smaller, specialized on-premise model (alias2-mini) matched the frontier model's defensive outcomes on multiple scenarios under identical, untuned prompts, and detected the attacker 10x faster on a complex enterprise scenario, suggesting that privacy-preserving on-premise models can serve as competent defenders against frontier-class attackers. The experiments further surface emergent agent behaviors, including scope expansion and prompt exfiltration, with implications for AI benchmark integrity and agentic system design.

ROJan 7, 2022
Robot Hacking Manual (RHM)

Víctor Mayoral-Vilches

Robots are often shipped insecure and in some cases fully unprotected. The rationale behind is fourfold: first, defensive security mechanisms for robots are still on their early stages, not covering the complete threat landscape. Second, the inherent complexity of robotic systems makes their protection costly, both technically and economically. Third, robot vendors do not generally take responsibility in a timely manner, extending the zero-days exposure window (time until mitigation of a zero-day) to several years on average. Fourth, contrary to the common-sense expectations in 21st century and similar to Ford in the 1920s with cars, most robot manufacturers oppose or difficult robot repairs. The Robot Hacking Manual (RHM) is an introductory series about cybersecurity for robots, with an attempt to provide comprehensive case studies and step-by-step tutorials with the intent to raise awareness in the field and highlight the importance of taking a security-first approach. The material available here is also a personal learning attempt and it's disconnected from any particular organization. Content is provided as is and by no means it's encouraged or promoted the unauthorized tampering of robotic systems or related technologies.

ROAug 30, 2021
Adaptive Computing in Robotics, Leveraging ROS 2 to Enable Software-Defined Hardware for FPGAs

Víctor Mayoral-Vilches, Giulio Corradi

Traditional software development in robotics is about programming functionality in the CPU of a given robot with a pre-defined architecture and constraints. With adaptive computing, instead, building a robotic behavior is about programming an architecture. By leveraging adaptive computing, roboticists can adapt one or more of the properties of its computing systems (e.g. its determinism, power consumption, security posture, or throughput) at run time. Roboticists are not, however, hardware engineers, and embedded expertise is scarce among them. This white paper adopts a ROS 2 roboticist-centric view for adaptive computing and proposes an architecture to include FPGAs as a first-class participant of the ROS 2 ecosystem. The architecture proposed is platform- and technology-agnostic, and is easily portable. The core components of the architecture are disclosed under an Apache 2.0 license, paving the way for roboticists to leverage adaptive computing and create software-defined hardware.

ROOct 15, 2020
alurity, a toolbox for robot cybersecurity

Víctor Mayoral-Vilches, Irati Abad-Fernández, Martin Pinzger et al.

The reuse of technologies and inherent complexity of most robotic systems is increasingly leading to robots with wide attack surfaces and a variety of potential vulnerabilities. Given their growing presence in public environments, security research is increasingly becoming more important than in any other area, specially due to the safety implications that robot vulnerabilities could cause on humans. We argue that security triage in robotics is still immature and that new tools must be developed to accelerate the testing-triage-exploitation cycle, necessary for prioritizing and accelerating the mitigation of flaws. The present work tackles the current lack of offensive cybersecurity research in robotics by presenting a toolbox and the results obtained with it through several use cases conducted over a year period. We propose a modular and composable toolbox for robot cybersecurity: alurity. By ensuring that both roboticists and security researchers working on a project have a common, consistent and easily reproducible development environment, alurity aims to facilitate the cybersecurity research and the collaboration across teams.

ROSep 17, 2020
Can ROS be used securely in industry? Red teaming ROS-Industrial

Víctor Mayoral-Vilches, Martin Pinzger, Stefan Rass et al.

With its growing use in industry, ROS is rapidly becoming a standard in robotics. While developments in ROS 2 show promise, the slow adoption cycles in industry will push widespread ROS 2 industrial adoption years from now. ROS will prevail in the meantime which raises the question: can ROS be used securely for industrial use cases even though its origins didn't consider it? The present study analyzes this question experimentally by performing a targeted offensive security exercise in a synthetic industrial use case involving ROS-Industrial and ROS packages. Our exercise results in four groups of attacks which manage to compromise the ROS computational graph, and all except one take control of most robotic endpoints at desire. To the best of our knowledge and given our setup, results do not favour the secure use of ROS in industry today, however, we managed to confirm that the security of certain robotic endpoints hold and remain optimistic about securing ROS industrial deployments.

ROMar 23, 2020
DevSecOps in Robotics

Víctor Mayoral-Vilches, Nuria García-Maestro, McKenna Towers et al.

Quality in software is often understood as "execution according to design purpose" whereas security means that "software will not put data or computing systems at risk of unauthorized access." There seems to be a connection between these two aspects but, how do we integrate both of them in the robotics development cycle? In this article we introduce DevSecOps in Robotics, a set of best practices designed to help roboticists implant security deep in the heart of their development and operations processes. First, we briefly describe DevOps, introduce the value added with DevSecOps and describe and illustrate how these practices may be implemented in the robotics field. We finalize with a discussion on the relationship between security, quality and safety, open problems and future research questions.

CRDec 16, 2019
Industrial robot ransomware: Akerbeltz

Víctor Mayoral-Vilches, Lander Usategui San Juan, Unai Ayucar Carbajo et al.

Cybersecurity lessons have not been learnt from the dawn of other technological industries. In robotics, the existing insecurity landscape needs to be addressed immediately. Several manufacturers profiting from the lack of general awareness are systematically ignoring their responsibilities by claiming their insecure (open) systems facilitate system integration, disregarding the safety, privacy and ethical consequences that their (lack of) actions have. In an attempt to raise awareness and illustrate the "insecurity by design in robotics" we have created Akerbeltz, the first known instance of industrial robot ransomware. Our malware is demonstrated using a leading brand for industrial collaborative robots, Universal Robots. We describe the rationale behind our target and discuss the general flow of the attack including the initial cyber-intrusion, lateral movement and later control phase. We urge security researchers to adopt some sort of disclosure policy that forces manufacturers to react promptly. We advocate against security by obscurity and encourage the release of similar actions once vulnerability reports fall into a dead-end. Actions are now to be taken to abide a future free of zero-days for robotics.