Luca Arnaboldi

CR
h-index59
24papers
589citations
Novelty49%
AI Score56

24 Papers

MLFeb 12, 2023
From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks

Luca Arnaboldi, Ludovic Stephan, Florent Krzakala et al.

This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels generated by a similar, though not necessarily identical, target function. We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk. Our unifying analysis bridges different regimes of interest, such as the classical gradient-flow regime of vanishing learning rate, the high-dimensional regime of large input dimension, and the overparameterised "mean-field" regime of large network width, covering as well the intermediate regimes where the limiting dynamics is determined by the interplay between these behaviours. In particular, in the high-dimensional limit, the infinite-width dynamics is found to remain close to a low-dimensional subspace spanned by the target principal directions. Our results therefore provide a unifying picture of the limiting SGD dynamics with synthetic data.

CLJun 21, 2022
Why Robust Natural Language Understanding is a Challenge

Marco Casadio, Ekaterina Komendantskaya, Verena Rieser et al.

With the proliferation of Deep Machine Learning into real-life applications, a particular property of this technology has been brought to attention: robustness Neural Networks notoriously present low robustness and can be highly sensitive to small input perturbations. Recently, many methods for verifying networks' general properties of robustness have been proposed, but they are mostly applied in Computer Vision. In this paper we propose a Verification specification for Natural Language Understanding classification based on larger regions of interest, and we discuss the challenges of such task. We observe that, although the data is almost linearly separable, the verifier struggles to output positive results and we explain the problems and implications.

26.1CRMar 24
What a Mesh: Formal Security Analysis of WPA3 SAE Wireless Authentication

Roberto Metere, Mario Lilli, Luca Arnaboldi et al.

The latest Wi-Fi security standard, IEEE 802.11, includes a secure authentication protocol called SAE, whose use is mandatory for WPA3-Personal networks. The protocol is specified at two separate but linked levels: a traditional cryptographic description of the communication logic between network devices, and a state machine description that realises the former in each single device. Current formal verification efforts focus mainly on communication logic. We present detailed formal models of the protocol at both levels, provide precise specifications of its security properties, and analyse machine-checked proofs in ProVerif and ASMETA. The integrated analysis of the above two models is particularly novel, enabling us to identify and address several issues in the current IEEE 802.11 specification more thoroughly than would have been possible otherwise, leading to several official revisions of the standard.

LGJan 24, 2025
Humanity's Last Exam

Long Phan, Alice Gatti, Ziwen Han et al. · amazon-science, apple-ml

Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.

CLFeb 18
ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models

Antoine Chaffin, Luca Arnaboldi, Amélie Chatelain et al.

Current state-of-the-art multi-vector models are obtained through a small Knowledge Distillation (KD) training step on top of strong single-vector models, leveraging the large-scale pre-training of these models. In this paper, we study the pre-training of multi-vector models and show that large-scale multi-vector pre-training yields much stronger multi-vector models. Notably, a fully ColBERT-pre-trained model, ColBERT-Zero, trained only on public data, outperforms GTE-ModernColBERT as well as its base model, GTE-ModernBERT, which leverages closed and much stronger data, setting new state-of-the-art for model this size. We also find that, although performing only a small KD step is not enough to achieve results close to full pre-training, adding a supervised step beforehand allows to achieve much closer performance while skipping the most costly unsupervised phase. Finally, we find that aligning the fine-tuning and pre-training setups is crucial when repurposing existing models. To enable exploration of our results, we release various checkpoints as well as code used to train them.

72.6LGMay 13
Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning

Yatin Dandi, Matteo Vilucchio, Luca Arnaboldi et al.

Understanding how deep neural networks learn useful internal representations from data remains a central open problem in the theory of deep learning. We introduce Neural Low-Degree Filtering (Neural LoFi), a stylized limit of gradient-based training in which hierarchical feature learning becomes an explicit iterative spectral procedure. In this limit, the dynamics at each layer decouple: given the current representation, the next layer selects directions with maximal accessible low-degree correlation to the label. This yields a tractable surrogate mechanism for deep learning, together with a natural kernel-space interpretation. Neural LoFi provides a mathematically explicit framework for studying multi-layer feature learning beyond the lazy regime. It predicts how representations are selected layer by layer, explains how emergence of concepts arises with given sample complexity,and gives a concrete mechanism by which depth progressively constructs new features from old ones through low-degree compositionality. We complement the theory with mechanistic experiments on fully connected and convolutional architectures, showing that Neural LoFi improves over lazy random-feature baselines, recovers meaningful structured filters, and predicts representations aligned with early gradient-descent feature discovery with real datasets.

62.9LGMay 11
Beyond Red-Teaming: Formal Guarantees of LLM Guardrail Classifiers

Nikita Kezins, Urbas Ekka, Pascal Berrang et al.

Guardrail Classifiers defend production language models against harmful behavior, but although results seem promising in testing, they provide no formal guarantees. Providing formal guarantees for such models is hard because "harmful behavior" has no natural specification in a discrete input space: and the standard epsilon-ball properties used in other domains do not carry semantic meaning. We close this gap by shifting verification from the discrete input space to the classifier's pre-activation space, where we define a harmful region as a convex shape enclosing the representations of known harmful prompts. Because the sigmoid classification head is monotonic, certifying the worst-case point is sufficient to certify the entire region, yielding a closed-form soundness proof without approximation in O(d) time. To formally evaluate these classifiers, we propose two constructions of such regions: SVD-aligned hyper-rectangles, which yield exact SAT/UNSAT certificates, and Gaussian Mixture Models, which yield probabilistic certificates over semantically coherent clusters. Applying this framework to three author-trained Guardrail Classifiers on the toxicity domain, every hyper-rectangle configuration returns SAT, exposing verifiable safety holes across all classifiers, despite seemingly high empirical metrics. Probabilistic GMM certificates also expose a divergent structural stability in how these models represent harm. While GPT-2 and Llama-3.1-8B maintain robust coverage of 90% and 80% across varying boundaries, BERT's safety guarantees prove uniquely volatile. This 'coverage collapse' to 55% at the optimal threshold reveals a sparsely populated safety margin in BERT, which only achieves full coverage by adopting an extremely conservative pessimistic threshold. These approaches combined, provide new insights on how effective Guardrail Classifiers really are, beyond traditional red-teaming.

MLFeb 5, 2024
The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

Yatin Dandi, Emanuele Troiani, Luca Arnaboldi et al.

We investigate the training dynamics of two-layer neural networks when learning multi-index target functions. We focus on multi-pass gradient descent (GD) that reuses the batches multiple times and show that it significantly changes the conclusion about which functions are learnable compared to single-pass gradient descent. In particular, multi-pass GD with finite stepsize is found to overcome the limitations of gradient flow and single-pass GD given by the information exponent (Ben Arous et al., 2021) and leap exponent (Abbe et al., 2023) of the target function. We show that upon re-using batches, the network achieves in just two time steps an overlap with the target subspace even for functions not satisfying the staircase property (Abbe et al., 2021). We characterize the (broad) class of functions efficiently learned in finite time. The proof of our results is based on the analysis of the Dynamical Mean-Field Theory (DMFT). We further provide a closed-form description of the dynamical process of the low-dimensional projections of the weights, and numerical experiments illustrating the theory.

MLMay 24, 2024
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions

Luca Arnaboldi, Yatin Dandi, Florent Krzakala et al.

Neural networks can identify low-dimensional relevant structures within high-dimensional noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we investigate the training dynamics of two-layer shallow neural networks trained with gradient-based algorithms, and discuss how they learn pertinent features in multi-index models, that is target functions with low-dimensional relevant directions. In the high-dimensional regime, where the input dimension $d$ diverges, we show that a simple modification of the idealized single-pass gradient descent training scenario, where data can now be repeated or iterated upon twice, drastically improves its computational efficiency. In particular, it surpasses the limitations previously believed to be dictated by the Information and Leap exponents associated with the target function to be learned. Our results highlight the ability of networks to learn relevant structures from data alone without any pre-processing. More precisely, we show that (almost) all directions are learned with at most $O(d \log d)$ steps. Among the exceptions is a set of hard functions that includes sparse parities. In the presence of coupling between directions, however, these can be learned sequentially through a hierarchical mechanism that generalizes the notion of staircase functions. Our results are proven by a rigorous study of the evolution of the relevant statistics for high-dimensional dynamics.

AIJan 12, 2024
Vehicle: Bridging the Embedding Gap in the Verification of Neuro-Symbolic Programs

Matthew L. Daggitt, Wen Kokke, Robert Atkey et al.

Neuro-symbolic programs, i.e. programs containing both machine learning components and traditional symbolic code, are becoming increasingly widespread. Finding a general methodology for verifying such programs is challenging due to both the number of different tools involved and the intricate interface between the ``neural'' and ``symbolic'' program components. In this paper we present a general decomposition of the neuro-symbolic verification problem into parts, and examine the problem of the embedding gap that occurs when one tries to combine proofs about the neural and symbolic components. To address this problem we then introduce Vehicle -- standing as an abbreviation for a ``verification condition language'' -- an intermediate programming language interface between machine learning frameworks, automated theorem provers, and dependently-typed formalisations of neuro-symbolic programs. Vehicle allows users to specify the properties of the neural components of neuro-symbolic programs once, and then safely compile the specification to each interface using a tailored typing and compilation procedure. We give a high-level overview of Vehicle's overall design, its interfaces and compilation & type-checking procedures, and then demonstrate its utility by formally verifying the safety of a simple autonomous car controlled by a neural network, operating in a stochastic environment with imperfect information.

CLMar 15, 2024
NLP Verification: Towards a General Methodology for Certifying Robustness

Marco Casadio, Tanvi Dinkar, Ekaterina Komendantskaya et al.

Machine Learning (ML) has exhibited substantial success in the field of Natural Language Processing (NLP). For example large language models have empirically proven to be capable of producing text of high complexity and cohesion. However, they are prone to inaccuracies and hallucinations. As these systems are increasingly integrated into real-world applications, ensuring their safety and reliability becomes a primary concern. There are safety critical contexts where such models must be robust to variability or attack, and give guarantees over their output. Computer Vision had pioneered the use of formal verification of neural networks for such scenarios and developed common verification standards and pipelines, leveraging precise formal reasoning about geometric properties of data manifolds. In contrast, NLP verification methods have only recently appeared in the literature. While presenting sophisticated algorithms, these papers have not yet crystallised into a common methodology. They are often light on the pragmatical issues of NLP verification and the area remains fragmented. In this paper, we attempt to distil and evaluate general components of an NLP verification pipeline, that emerges from the progress in the field to date. Our contributions are two-fold. Firstly, we propose a general methodology to analyse the effect of the embedding gap, a problem that refers to the discrepancy between verification of geometric subspaces and the semantic meaning of sentences, which the geometric subspaces are supposed to represent. We propose a number of practical NLP methods that can help to quantify the effects of the embedding gap. Secondly, we give a general method for training and verification of neural networks that leverages a more precise geometric estimation of semantic similarity of sentences in the embedding space and helps to overcome the effects of the embedding gap in practice.

MLJun 3, 2025
Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks

Luca Arnaboldi, Bruno Loureiro, Ludovic Stephan et al.

We study the dynamics of stochastic gradient descent (SGD) for a class of sequence models termed Sequence Single-Index (SSI) models, where the target depends on a single direction in input space applied to a sequence of tokens. This setting generalizes classical single-index models to the sequential domain, encompassing simplified one-layer attention architectures. We derive a closed-form expression for the population loss in terms of a pair of sufficient statistics capturing semantic and positional alignment, and characterize the induced high-dimensional SGD dynamics for these coordinates. Our analysis reveals two distinct training phases: escape from uninformative initialization and alignment with the target subspace, and demonstrates how the sequence length and positional encoding influence convergence speed and learning trajectories. These results provide a rigorous and interpretable foundation for understanding how sequential structure in data can be beneficial for learning with attention-based models.

MLJun 4, 2024
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs

Luca Arnaboldi, Yatin Dandi, Florent Krzakala et al.

We study the impact of the batch size $n_b$ on the iteration time $T$ of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. We characterize the optimal batch size minimizing the iteration time as a function of the hardness of the target, as characterized by the information exponents. We show that performing gradient updates with large batches $n_b \lesssim d^{\frac{\ell}{2}}$ minimizes the training time without changing the total sample complexity, where $\ell$ is the information exponent of the target to be learned \citep{arous2021online} and $d$ is the input dimension. However, larger batch sizes than $n_b \gg d^{\frac{\ell}{2}}$ are detrimental for improving the time complexity of SGD. We provably overcome this fundamental limitation via a different training protocol, \textit{Correlation loss SGD}, which suppresses the auto-correlation terms in the loss function. We show that one can track the training progress by a system of low-dimensional ordinary differential equations (ODEs). Finally, we validate our theoretical results with numerical experiments.

MLMay 29, 2023
Escaping mediocrity: how two-layer networks learn hard generalized linear models with SGD

Luca Arnaboldi, Florent Krzakala, Bruno Loureiro et al.

This study explores the sample complexity for two-layer neural networks to learn a generalized linear target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization. It is well-established that in this scenario $n=O(d \log d)$ samples are typically needed. However, we provide precise results concerning the pre-factors in high-dimensional contexts and for varying widths. Notably, our findings suggest that overparameterization can only enhance convergence by a constant factor within this problem class. These insights are grounded in the reduction of SGD dynamics to a stochastic process in lower dimensions, where escaping mediocrity equates to calculating an exit time. Yet, we demonstrate that a deterministic approximation of this process adequately represents the escape time, implying that the role of stochasticity may be minimal in this scenario.

CLMay 6, 2023
ANTONIO: Towards a Systematic Method of Generating NLP Benchmarks for Verification

Marco Casadio, Luca Arnaboldi, Matthew L. Daggitt et al.

Verification of machine learning models used in Natural Language Processing (NLP) is known to be a hard problem. In particular, many known neural network verification methods that work for computer vision and other numeric datasets do not work for NLP. Here, we study technical reasons that underlie this problem. Based on this analysis, we propose practical methods and heuristics for preparing NLP datasets and models in a way that renders them amenable to known verification methods based on abstract interpretation. We implement these methods as a Python library called ANTONIO that links to the neural network verifiers ERAN and Marabou. We perform evaluation of the tool using an NLP dataset R-U-A-Robot suggested as a benchmark for verifying legally critical NLP applications. We hope that, thanks to its general applicability, this work will open novel possibilities for including NLP verification problems into neural network verification competitions, and will popularise NLP problems within this community.

LGFeb 10, 2022
Vehicle: Interfacing Neural Network Verifiers with Interactive Theorem Provers

Matthew L. Daggitt, Wen Kokke, Robert Atkey et al.

Verification of neural networks is currently a hot topic in automated theorem proving. Progress has been rapid and there are now a wide range of tools available that can verify properties of networks with hundreds of thousands of nodes. In theory this opens the door to the verification of larger control systems that make use of neural network components. However, although work has managed to incorporate the results of these verifiers to prove larger properties of individual systems, there is currently no general methodology for bridging the gap between verifiers and interactive theorem provers (ITPs). In this paper we present Vehicle, our solution to this problem. Vehicle is equipped with an expressive domain specific language for stating neural network specifications which can be compiled to both verifiers and ITPs. It overcomes previous issues with maintainability and scalability in similar ITP formalisations by using a standard ONNX file as the single canonical representation of the network. We demonstrate its utility by using it to connect the neural network verifier Marabou to Agda and then formally verifying that a car steered by a neural network never leaves the road, even in the face of an unpredictable cross wind and imperfect sensors. The network has over 20,000 nodes, and therefore this proof represents an improvement of 3 orders of magnitude over prior proofs about neural network enhanced systems in ITPs.

CRMay 19, 2021
Automating Cryptographic Protocol Language Generation from Structured Specifications

Roberto Metere, Luca Arnaboldi

Security of cryptographic protocols can be analysed by creating a model in a formal language and verifying the model in a tool. All such tools focus on the last part of the analysis, verification, and the interpretation of the specification is only explained in papers. Rather, we focus on the interpretation and modelling part by presenting a tool to aid the cryptographer throughout the process and automatically generating code in a target language. We adopt a data-centric approach where the protocol design is stored in a structured way rather than as textual specifications. Previous work shows how this approach facilitates the interpretation to a single language (for Tamarin) which required aftermath modifications. By improving the expressiveness of the specification data structure we extend the tool to export to an additional formal language, ProVerif, as well as a C++ fully running implementation. Furthermore, we extend the plugins to verify correctness in ProVerif and executability lemmas in Tamarin. In this paper we model the Diffie-Hellman key exchange, which is traditionally used as a case study; a demo is also provided for other commonly studied protocols, Needham- Schroeder and Needham-Schroeder-Lowe.

CRMay 17, 2021
A Review of Intrusion Detection Systems and Their Evaluation in the IoT

Luca Arnaboldi, Charles Morisset

Intrusion Detection Systems (IDS) are key components for securing critical infrastructures, capable of detecting malicious activities on networks or hosts. The procedure of implementing a IDS for Internet of Things (IoT) networks is not without challenges due to the variability of these systems and specifically the difficulty in accessing data. The specifics of these very constrained devices render the design of an IDS capable of dealing with the varied attacks a very challenging problem and a very active research subject. In the current state of literature, a number of approaches have been proposed to improve the efficiency of intrusion detection, catering to some of these limitations, such as resource constraints and mobility. In this article, we review works on IDS specifically for these kinds of devices from 2008 to 2018, collecting a total of 51 different IDS papers. We summarise the current themes of the field, summarise the techniques employed to train and deploy the IDSs and provide a qualitative evaluations of these approaches. While these works provide valuable insights and solutions for sub-parts of these constraints, we discuss the limitations of these solutions as a whole, in particular what kinds of attacks these approaches struggle to detect and the setup limitations that are unique to this kind of system. We find that although several paper claim novelty of their approach little inter paper comparisons have been made, that there is a dire need for sharing of datasets and almost no shared code repositories, consequently raising the need for a thorough comparative evaluation.

LGMar 23, 2021
Volume-Centred Range Bars: Novel Interpretable Representation of Financial Markets Designed for Machine Learning Applications

Artur Sokolovsky, Luca Arnaboldi, Jaume Bacardit et al.

Financial markets are a source of non-stationary multidimensional time series which has been drawing attention for decades. Each financial instrument has its specific changing-over-time properties, making its analysis a complex task. Hence, improvement of understanding and development of more informative, generalisable market representations are essential for the successful operation in financial markets, including risk assessment, diversification, trading, and order execution. In this study, we propose a volume-price-based market representation for making financial time series more suitable for machine learning pipelines. We use a statistical approach for evaluating the representation. Through the research questions, we investigate, i) whether the proposed representation allows the more efficient design of machine learning models; ii) whether the proposed representation leads to increased performance over the price levels market pattern; iii) whether the proposed representation performs better on the liquid markets, and iv) whether SHAP feature interactions are reliable to be used in the considered setting. Our analysis shows that the proposed volume-based method allows successful classification of the financial time series patterns, and also leads to better classification performance than the price levels-based method, excelling specifically on more liquid financial instruments. Finally, we propose an approach for obtaining feature interactions directly from tree-based models and compare the outcomes to those of the SHAP method. This results in the significant similarity between the two methods, hence we claim that SHAP feature interactions are reliable to be used in the setting of financial markets.

TRSep 21, 2020
A Generic Methodology for the Statistically Uniform & Comparable Evaluation of Automated Trading Platform Components

Artur Sokolovsky, Luca Arnaboldi

Although machine learning approaches have been widely used in the field of finance, to very successful degrees, these approaches remain bespoke to specific investigations and opaque in terms of explainability, comparability, and reproducibility. The primary objective of this research was to shed light upon this field by providing a generic methodology that was investigation-agnostic and interpretable to a financial markets practitioner, thus enhancing their efficiency, reducing barriers to entry, and increasing the reproducibility of experiments. The proposed methodology is showcased on two automated trading platform components. Namely, price levels, a well-known trading pattern, and a novel 2-step feature extraction method. The methodology relies on hypothesis testing, which is widely applied in other social and scientific disciplines to effectively evaluate the concrete results beyond simple classification accuracy. The main hypothesis was formulated to evaluate whether the selected trading pattern is suitable for use in the machine learning setting. Across the experiments we found that the use of the considered trading pattern in the machine learning setting is only partially supported by statistics, resulting in insignificant effect sizes (Rebound 7 - $0.64 \pm 1.02$, Rebound 11 $0.38 \pm 0.98$, and rebound 15 - $1.05 \pm 1.16$), but allowed the rejection of the null hypothesis. We showcased the generic methodology on a US futures market instrument and provided evidence that with this methodology we could easily obtain informative metrics beyond the more traditional performance and profitability metrics. This work is one of the first in applying this rigorous statistically-backed approach to the field of financial markets and we hope this may be a springboard for more research.

SYNov 28, 2019
Modelling Load-Changing Attacks in Cyber-Physical Systems

Luca Arnaboldi, Ricardo M. Czekster, Roberto Metere et al.

Cyber-Physical Systems (CPS) are present in many settings addressing a myriad of purposes. Examples are Internet-of-Things (IoT) or sensing software embedded in appliances or even specialised meters that measure and respond to electricity demands in smart grids. Due to their pervasive nature, they are usually chosen as recipients for larger scope cyber-security attacks. Those promote system-wide disruptions and are directed towards one key aspect such as confidentiality, integrity, availability or a combination of those characteristics. Our paper focuses on a particular and distressing attack where coordinated malware infected IoT units are maliciously employed to synchronously turn on or off high-wattage appliances, affecting the grid's primary control management. Our model could be extended to larger (smart) grids, Active Buildings as well as similar infrastructures. Our approach models Coordinated Load-Changing Attacks (CLCA) also referred as GridLock or BlackIoT, against a theoretical power grid, containing various types of power plants. It employs Continuous-Time Markov Chains where elements such as Power Plants and Botnets are modelled under normal or attack situations to evaluate the effect of CLCA in power reliant infrastructures. We showcase our modelling approach in the scenario of a power supplier (e.g. power plant) being targeted by a botnet. We demonstrate how our modelling approach can quantify the impact of a botnet attack and be abstracted for any CPS system involving power load management in a smart grid. Our results show that by prioritising the type of power-plants, the impact of the attack may change: in particular, we find the most impacting attack times and show how different strategies impact their success. We also find the best power generator to use depending on the current demand and strength of attack.

CROct 7, 2019
Towards a Data Centric Approach for the Design and Verification of Cryptographic Protocols

Luca Arnaboldi, Roberto Metere

We propose MetaCP, a Meta Cryptography Protocol verification tool, as an automated tool simplifying the design of security protocols through a graphical interface. The graphical interface can be seen as a modern editor of a non-relational database whose data are protocols. The information of protocols are stored in XML, enjoying a fixed format and syntax aiming to contain all required information to specify any kind of protocol. This XML can be seen as an almost semanticless language, where different plugins confer strict semantics modelling the protocol into a variety of back-end verification languages. In this paper, we showcase the effectiveness of this novel approach by demonstrating how easy MetaCP makes it to design and verify a protocol going from the graphical design to formally verified protocol using a Tamarin prover plugin. Whilst similar approaches have been proposed in the past, most famously the AVISPA Tool, no previous approach provides such as small learning curve and ease of use even for non security professionals, combined with the flexibility to integrate with the state of the art verification tools.

CRJan 24, 2019
Generating Synthetic Data for Real World Detection of DoS Attacks in the IoT

Luca Arnaboldi, Charles Morisset

Denial of service attacks are especially pertinent to the internet of things as devices have less computing power, memory and security mechanisms to defend against them. The task of mitigating these attacks must therefore be redirected from the device onto a network monitor. Network intrusion detection systems can be used as an effective and efficient technique in internet of things systems to offload computation from the devices and detect denial of service attacks before they can cause harm. However the solution of implementing a network intrusion detection system for internet of things networks is not without challenges due to the variability of these systems and specifically the difficulty in collecting data. We propose a model-hybrid approach to model the scale of the internet of things system and effectively train network intrusion detection systems. Through bespoke datasets generated by the model, the IDS is able to predict a wide spectrum of real-world attacks, and as demonstrated by an experiment construct more predictive datasets at a fraction of the time of other more standard techniques.

CROct 30, 2017
Quantitative Analysis of DoS Attacks and Client Puzzles in IoT Systems

Luca Arnaboldi, Charles Morrisset

Denial of Service (DoS) attacks constitute a major security threat to today's Internet. This challenge is especially pertinent to the Internet of Things (IoT) as devices have less computing power, memory and security mechanisms to mitigate DoS attacks. This paper presents a model that mimics the unique characteristics of a network of IoT devices, including components of the system implementing `Crypto Puzzles' - a DoS mitigation technique. We created an imitation of a DoS attack on the system, and conducted a quantitative analysis to simulate the impact such an attack may potentially exert upon the system, assessing the trade off between security and throughput in the IoT system. We model this through stochastic model checking in PRISM and provide evidence that supports this as a valuable method to compare the efficiency of different implementations of IoT systems, exemplified by a case study.