Stefano Calzavara

h-index21

14papers

197citations

Novelty64%

AI Score45

Ranked #68,156 of 205,806 authors (top 33%)#1,566 in CR (top 21%)

14 Papers

LGSep 27, 2022

Explainable Global Fairness Verification of Tree-Based Classifiers

Stefano Calzavara, Lorenzo Cazzaro, Claudio Lucchese et al.

We present a new approach to the global fairness verification of tree-based classifiers. Given a tree-based classifier and a set of sensitive features potentially leading to discrimination, our analysis synthesizes sufficient conditions for fairness, expressed as a set of traditional propositional logic formulas, which are readily understandable by human experts. The verified fairness guarantees are global, in that the formulas predicate over all the possible inputs of the classifier, rather than just a few specific test instances. Our analysis is formally proved both sound and complete. Experimental results on public datasets show that the analysis is precise, explainable to human experts and efficient enough for practical adoption.

46.7CRApr 30

LLM-Assisted Web Measurements

Simone Bozzolan, Stefano Calzavara, Lorenzo Cazzaro

Web measurements are a well-established methodology for assessing the security and privacy landscape of the Internet. However, existing top lists of popular websites are unlabeled and lack semantic information about the nature of the included websites, making targeted web measurements challenging, as researchers often rely on ad-hoc techniques to bias datasets toward specific website classes of interest. In this paper, we investigate the use of Large Language Models (LLMs) to enable targeted web measurement studies. Building on prior literature, we identify key website classification tasks relevant to web measurements and highlight limitations in state-of-the-art classification approaches. We construct carefully curated datasets to evaluate different LLMs on these tasks. Our results show that LLMs can achieve strong performance across multiple classification scenarios, but the choice of model and configuration plays a significant role. Motivated by the observed trade-off between classification accuracy and computational efficiency, we propose a practical two-step methodology for scalable targeted web measurements starting from the Tranco list. Finally, we conduct LLM-assisted web measurement studies inspired by prior work using our methodology and assess the validity of the resulting research inferences, showing that LLMs can effectively enable targeted measurements of security and privacy trends on the Web.

LGFeb 22, 2024

Verifiable Boosted Tree Ensembles

Stefano Calzavara, Lorenzo Cazzaro, Claudio Lucchese et al.

Verifiable learning advocates for training machine learning models amenable to efficient security verification. Prior research demonstrated that specific classes of decision tree ensembles -- called large-spread ensembles -- allow for robustness verification in polynomial time against any norm-based attacker. This study expands prior work on verifiable learning from basic ensemble methods (i.e., hard majority voting) to advanced boosted tree ensembles, such as those trained using XGBoost or LightGBM. Our formal results indicate that robustness verification is achievable in polynomial time when considering attackers based on the $L_\infty$-norm, but remains NP-hard for other norm-based attackers. Nevertheless, we present a pseudo-polynomial time algorithm to verify robustness against attackers based on the $L_p$-norm for any $p \in \mathbb{N} \cup \{0\}$, which in practice grants excellent performance. Our experimental evaluation shows that large-spread boosted ensembles are accurate enough for practical adoption, while being amenable to efficient security verification.

LGMay 5, 2023

Verifiable Learning for Robust Tree Ensembles

Stefano Calzavara, Lorenzo Cazzaro, Giulio Ermanno Pibiri et al.

Verifying the robustness of machine learning models against evasion attacks at test time is an important research problem. Unfortunately, prior work established that this problem is NP-hard for decision tree ensembles, hence bound to be intractable for specific inputs. In this paper, we identify a restricted class of decision tree ensembles, called large-spread ensembles, which admit a security verification algorithm running in polynomial time. We then propose a new approach called verifiable learning, which advocates the training of such restricted model classes which are amenable for efficient verification. We show the benefits of this idea by designing a new training algorithm that automatically learns a large-spread decision tree ensemble from labelled data, thus enabling its security verification in polynomial time. Experimental results on public datasets confirm that large-spread ensembles trained using our algorithm can be verified in a matter of seconds, using standard commercial hardware. Moreover, large-spread ensembles are more robust than traditional ensembles against evasion attacks, at the cost of an acceptable loss of accuracy in the non-adversarial setting.

LGDec 5, 2021

Beyond Robustness: Resilience Verification of Tree-Based Classifiers

Stefano Calzavara, Lorenzo Cazzaro, Claudio Lucchese et al.

In this paper we criticize the robustness measure traditionally employed to assess the performance of machine learning models deployed in adversarial settings. To mitigate the limitations of robustness, we introduce a new measure called resilience and we focus on its verification. In particular, we discuss how resilience can be verified by combining a traditional robustness verification technique with a data-independent stability analysis, which identifies a subset of the feature space where the model does not change its predictions despite adversarial manipulations. We then introduce a formally sound data-independent stability analysis for decision trees and decision tree ensembles, which we experimentally assess on public datasets and we leverage for resilience verification. Our results show that resilience verification is useful and feasible in practice, yielding a more reliable security assessment of both standard and robust decision tree models.

CRJan 15, 2021

Bulwark: Holistic and Verified Security Monitoring of Web Protocols

Lorenzo Veronese, Stefano Calzavara, Luca Compagna

Modern web applications often rely on third-party services to provide their functionality to users. The secure integration of these services is a non-trivial task, as shown by the large number of attacks against Single Sign On and Cashier-as-a-Service protocols. In this paper we present Bulwark, a new automatic tool which generates formally verified security monitors from applied pi-calculus specifications of web protocols. The security monitors generated by Bulwark offer holistic protection, since they can be readily deployed both at the client side and at the server side, thus ensuring full visibility of the attack surface against web protocols. We evaluate the effectiveness of Bulwark by testing it against a pool of vulnerable web applications that use the OAuth 2.0 protocol or integrate the PayPal payment system.

CRDec 3, 2020

Can I Take Your Subdomain? Exploring Related-Domain Attacks in the Modern Web

Marco Squarcina, Mauro Tempesta, Lorenzo Veronese et al.

Related-domain attackers control a sibling domain of their target web application, e.g., as the result of a subdomain takeover. Despite their additional power over traditional web attackers, related-domain attackers received only limited attention by the research community. In this paper we define and quantify for the first time the threats that related-domain attackers pose to web application security. In particular, we first clarify the capabilities that related-domain attackers can acquire through different attack vectors, showing that different instances of the related-domain attacker concept are worth attention. We then study how these capabilities can be abused to compromise web application security by focusing on different angles, including: cookies, CSP, CORS, postMessage and domain relaxation. By building on this framework, we report on a large-scale security measurement on the top 50k domains from the Tranco list that led to the discovery of vulnerabilities in 887 sites, where we quantified the threats posed by related-domain attackers to popular web applications.

LGJul 6, 2020

Certifying Decision Trees Against Evasion Attacks by Program Analysis

Stefano Calzavara, Pietro Ferrara, Claudio Lucchese

Machine learning has proved invaluable for a range of different tasks, yet it also proved vulnerable to evasion attacks, i.e., maliciously crafted perturbations of input data designed to force mispredictions. In this paper we propose a novel technique to verify the security of decision tree models against evasion attacks with respect to an expressive threat model, where the attacker can be represented by an arbitrary imperative program. Our approach exploits the interpretability property of decision trees to transform them into imperative programs, which are amenable for traditional program analysis techniques. By leveraging the abstract interpretation framework, we are able to soundly verify the security guarantees of decision tree models trained over publicly available datasets. Our experiments show that our technique is both precise and efficient, yielding only a minimal number of false positives and scaling up to cases which are intractable for a competitor approach.

LGApr 7, 2020

Feature Partitioning for Robust Tree Ensembles and their Certification in Adversarial Scenarios

Stefano Calzavara, Claudio Lucchese, Federico Marcuzzi et al.

Machine learning algorithms, however effective, are known to be vulnerable in adversarial scenarios where a malicious user may inject manipulated instances. In this work we focus on evasion attacks, where a model is trained in a safe environment and exposed to attacks at test time. The attacker aims at finding a minimal perturbation of a test instance that changes the model outcome. We propose a model-agnostic strategy that builds a robust ensemble by training its basic models on feature-based partitions of the given dataset. Our algorithm guarantees that the majority of the models in the ensemble cannot be affected by the attacker. We experimented the proposed strategy on decision tree ensembles, and we also propose an approximate certification method for tree ensembles that efficiently assess the minimal accuracy of a forest on a given dataset avoiding the costly computation of evasion attacks. Experimental evaluation on publicly available datasets shows that proposed strategy outperforms state-of-the-art adversarial learning algorithms against evasion attacks.

CRJan 28, 2020

Language-Based Web Session Integrity

Stefano Calzavara, Riccardo Focardi, Niklas Grimm et al.

Session management is a fundamental component of web applications: despite the apparent simplicity, correctly implementing web sessions is extremely tricky, as witnessed by the large number of existing attacks. This motivated the design of formal methods to rigorously reason about web session security which, however, are not supported at present by suitable automated verification techniques. In this paper we introduce the first security type system that enforces session security on a core model of web applications, focusing in particular on server-side code. We showcase the expressiveness of our type system by analyzing the session management logic of HotCRP, Moodle, and phpMyAdmin, unveiling novel security flaws that have been acknowledged by software developers.

LGJul 2, 2019

Treant: Training Evasion-Aware Decision Trees

Stefano Calzavara, Claudio Lucchese, Gabriele Tolomei et al.

Despite its success and popularity, machine learning is now recognized as vulnerable to evasion attacks, i.e., carefully crafted perturbations of test inputs designed to force prediction errors. In this paper we focus on evasion attacks against decision tree ensembles, which are among the most successful predictive models for dealing with non-perceptual problems. Even though they are powerful and interpretable, decision tree ensembles have received only limited attention by the security and machine learning communities so far, leading to a sub-optimal state of the art for adversarial learning techniques. We thus propose Treant, a novel decision tree learning algorithm that, on the basis of a formal threat model, minimizes an evasion-aware loss function at each step of the tree construction. Treant is based on two key technical ingredients: robust splitting and attack invariance, which jointly guarantee the soundness of the learning process. Experimental results on three publicly available datasets show that Treant is able to generate decision tree ensembles that are at the same time accurate and nearly insensitive to evasion attacks, outperforming state-of-the-art adversarial learning techniques.

CRJun 24, 2018

WPSE: Fortifying Web Protocols via Browser-Side Security Monitoring

Stefano Calzavara, Riccardo Focardi, Matteo Maffei et al.

We present WPSE, a browser-side security monitor for web protocols designed to ensure compliance with the intended protocol flow, as well as confidentiality and integrity properties of messages. We formally prove that WPSE is expressive enough to protect web applications from a wide range of protocol implementation bugs and web attacks. We discuss concrete examples of attacks which can be prevented by WPSE on OAuth 2.0 and SAML 2.0, including a novel attack on the Google implementation of SAML 2.0 which we discovered by formalizing the protocol specification in WPSE. Moreover, we use WPSE to carry out an extensive experimental evaluation of OAuth 2.0 in the wild. Out of 90 tested websites, we identify security flaws in 55 websites (61.1%), including new critical vulnerabilities introduced by tracking libraries such as Facebook Pixel, all of which fixable by WPSE. Finally, we show that WPSE works flawlessly on 83 websites (92.2%), with the 7 compatibility issues being caused by custom implementations deviating from the OAuth 2.0 specification, one of which introducing a critical vulnerability.

CRJul 25, 2017

HornDroid: Practical and Sound Static Analysis of Android Applications by SMT Solving

Stefano Calzavara, Ilya Grishchenko, Matteo Maffei

We present HornDroid, a new tool for the static analysis of information flow properties in Android applications. The core idea underlying HornDroid is to use Horn clauses for soundly abstracting the semantics of Android applications and to express security properties as a set of proof obligations that are automatically discharged by an off-the-shelf SMT solver. This approach makes it possible to fine-tune the analysis in order to achieve a high degree of precision while still using off-the-shelf verification tools, thereby leveraging the recent advances in this field. As a matter of fact, HornDroid outperforms state-of-the-art Android static analysis tools on benchmarks proposed by the community. Moreover, HornDroid is the first static analysis tool for Android to come with a formal proof of soundness, which covers the core of the analysis technique: besides yielding correctness assurances, this proof allowed us to identify some critical corner-cases that affect the soundness guarantees provided by some of the previous static analysis tools for Android.

CRMay 30, 2017

A Sound Flow-Sensitive Heap Abstraction for the Static Analysis of Android Applications

Stefano Calzavara, Ilya Grishchenko, Adrien Koutsos et al.

The present paper proposes the first static analysis for Android applications which is both flow-sensitive on the heap abstraction and provably sound with respect to a rich formal model of the Android platform. We formulate the analysis as a set of Horn clauses defining a sound over-approximation of the semantics of the Android application to analyse, borrowing ideas from recency abstraction and extending them to our concurrent setting. Moreover, we implement the analysis in HornDroid, a state-of-the-art information flow analyser for Android applications. Our extension allows HornDroid to perform strong updates on heap-allocated data structures, thus significantly increasing its precision, without sacrificing its soundness guarantees. We test our implementation on DroidBench, a popular benchmark of Android applications developed by the research community, and we show that our changes to HornDroid lead to an improvement in the precision of the tool, while having only a moderate cost in terms of efficiency. Finally, we assess the scalability of our tool to the analysis of real applications.