Stefano Zanero

CR
h-index34
12papers
125citations
Novelty54%
AI Score49

12 Papers

CROct 15, 2014Code
XSS Peeker: A Systematic Analysis of Cross-site Scripting Vulnerability Scanners

Enrico Bazzoli, Claudio Criscione, Federico Maggi et al.

Since the first publication of the "OWASP Top 10" (2004), cross-site scripting (XSS) vulnerabilities have always been among the top 5 web application security bugs. Black-box vulnerability scanners are widely used in the industry to reproduce (XSS) attacks automatically. In spite of the technical sophistication and advancement, previous work showed that black-box scanners miss a non-negligible portion of vulnerabilities, and report non-existing, non-exploitable or uninteresting vulnerabilities. Unfortunately, these results hold true even for XSS vulnerabilities, which are relatively simple to trigger if compared, for instance, to logic flaws. Black-box scanners have not been studied in depth on this vertical: knowing precisely how scanners try to detect XSS can provide useful insights to understand their limitations, to design better detection methods. In this paper, we present and discuss the results of a detailed and systematic study on 6 black-box web scanners (both proprietary and open source) that we conducted in coordination with the respective vendors. To this end, we developed an automated tool to (1) extract the payloads used by each scanner, (2) distill the "templates" that have originated each payload, (3) evaluate them according to quality indicators, and (4) perform a cross-scanner analysis. Unlike previous work, our testbed application, which contains a large set of XSS vulnerabilities, including DOM XSS, was gradually retrofitted to accomodate for the payloads that triggered no vulnerabilities. Our analysis reveals a highly fragmented scenario. Scanners exhibit a wide variety of distinct payloads, a non-uniform approach to fuzzing and mutating the payloads, and a very diverse detection effectiveness.

QUANT-PHFeb 16, 2025
Evaluating the Potential of Quantum Machine Learning in Cybersecurity: A Case-Study on PCA-based Intrusion Detection Systems

Armando Bellante, Tommaso Fioravanti, Michele Carminati et al.

Quantum computing promises to revolutionize our understanding of the limits of computation, and its implications in cryptography have long been evident. Today, cryptographers are actively devising post-quantum solutions to counter the threats posed by quantum-enabled adversaries. Meanwhile, quantum scientists are innovating quantum protocols to empower defenders. However, the broader impact of quantum computing and quantum machine learning (QML) on other cybersecurity domains still needs to be explored. In this work, we investigate the potential impact of QML on cybersecurity applications of traditional ML. First, we explore the potential advantages of quantum computing in machine learning problems specifically related to cybersecurity. Then, we describe a methodology to quantify the future impact of fault-tolerant QML algorithms on real-world problems. As a case study, we apply our approach to standard methods and datasets in network intrusion detection, one of the most studied applications of machine learning in cybersecurity. Our results provide insight into the conditions for obtaining a quantum advantage and the need for future quantum hardware and software advancements.

CRMay 31, 2025
PackHero: A Scalable Graph-based Approach for Efficient Packer Identification

Marco Di Gennaro, Mario D'Onghia, Mario Polino et al.

Anti-analysis techniques, particularly packing, challenge malware analysts, making packer identification fundamental. Existing packer identifiers have significant limitations: signature-based methods lack flexibility and struggle against dynamic evasion, while Machine Learning approaches require extensive training data, limiting scalability and adaptability. Consequently, achieving accurate and adaptable packer identification remains an open problem. This paper presents PackHero, a scalable and efficient methodology for identifying packers using a novel static approach. PackHero employs a Graph Matching Network and clustering to match and group Call Graphs from programs packed with known packers. We evaluate our approach on a public dataset of malware and benign samples packed with various packers, demonstrating its effectiveness and scalability across varying sample sizes. PackHero achieves a macro-average F1-score of 93.7% with just 10 samples per packer, improving to 98.3% with 100 samples. Notably, PackHero requires fewer samples to achieve stable performance compared to other Machine Learning-based tools. Overall, PackHero matches the performance of State-of-the-art signature-based tools, outperforming them in handling Virtualization-based packers such as Themida/Winlicense, with a recall of 100%.

CRMay 31, 2025
Amatriciana: Exploiting Temporal GNNs for Robust and Efficient Money Laundering Detection

Marco Di Gennaro, Francesco Panebianco, Marco Pianta et al.

Money laundering is a financial crime that poses a serious threat to financial integrity and social security. The growing number of transactions makes it necessary to use automatic tools that help law enforcement agencies detect such criminal activity. In this work, we present Amatriciana, a novel approach based on Graph Neural Networks to detect money launderers inside a graph of transactions by considering temporal information. Amatriciana uses the whole graph of transactions without splitting it into several time-based subgraphs, exploiting all relational information in the dataset. Our experiments on a public dataset reveal that the model can learn from a limited amount of data. Furthermore, when more data is available, the model outperforms other State-of-the-art approaches; in particular, Amatriciana decreases the number of False Positives (FPs) while detecting many launderers. In summary, Amatriciana achieves an F1 score of 0.76. In addition, it lowers the FPs by 55% with respect to other State-of-the-art models.

QUANT-PHOct 8, 2025
Quantum Sparse Recovery and Quantum Orthogonal Matching Pursuit

Armando Bellante, Stefano Vanerio, Stefano Zanero

We study quantum sparse recovery in non-orthogonal, overcomplete dictionaries: given coherent quantum access to a state and a dictionary of vectors, the goal is to reconstruct the state up to $\ell_2$ error using as few vectors as possible. We first show that the general recovery problem is NP-hard, ruling out efficient exact algorithms in full generality. To overcome this, we introduce Quantum Orthogonal Matching Pursuit (QOMP), the first quantum analogue of the classical OMP greedy algorithm. QOMP combines quantum subroutines for inner product estimation, maximum finding, and block-encoded projections with an error-resetting design that avoids iteration-to-iteration error accumulation. Under standard mutual incoherence and well-conditioned sparsity assumptions, QOMP provably recovers the exact support of a $K$-sparse state in polynomial time. As an application, we give the first framework for sparse quantum tomography with non-orthogonal dictionaries in $\ell_2$ norm, achieving query complexity $\widetilde{O}(\sqrt{N}/ε)$ in favorable regimes and reducing tomography to estimating only $K$ coefficients instead of $N$ amplitudes. In particular, for pure-state tomography with $m=O(N)$ dictionary vectors and sparsity $K=\widetilde{O}(1)$ on a well-conditioned subdictionary, this circumvents the $\widetildeΩ(N/ε)$ lower bound that holds in the dense, orthonormal-dictionary setting, without contradiction, by leveraging sparsity together with non-orthogonality. Beyond tomography, we analyze QOMP in the QRAM model, where it yields polynomial speedups over classical OMP implementations, and provide a quantum algorithm to estimate the mutual incoherence of a dictionary of $m$ vectors in $O(m/ε)$ queries, improving over both deterministic and quantum-inspired classical methods.

CRSep 8, 2025
When Secure Isn't: Assessing the Security of Machine Learning Model Sharing

Gabriele Digregorio, Marco Di Gennaro, Stefano Zanero et al.

The rise of model-sharing through frameworks and dedicated hubs makes Machine Learning significantly more accessible. Despite their benefits, these tools expose users to underexplored security risks, while security awareness remains limited among both practitioners and developers. To enable a more security-conscious culture in Machine Learning model sharing, in this paper we evaluate the security posture of frameworks and hubs, assess whether security-oriented mechanisms offer real protection, and survey how users perceive the security narratives surrounding model sharing. Our evaluation shows that most frameworks and hubs address security risks partially at best, often by shifting responsibility to the user. More concerningly, our analysis of frameworks advertising security-oriented settings and complete model sharing uncovered six 0-day vulnerabilities enabling arbitrary code execution. Through this analysis, we debunk the misconceptions that the model-sharing problem is largely solved and that its security can be guaranteed by the file format used for sharing. As expected, our survey shows that the surrounding security narrative leads users to consider security-oriented settings as trustworthy, despite the weaknesses shown in this work. From this, we derive takeaways and suggestions to strengthen the security of model-sharing ecosystems.

CRJun 12, 2025
Assessing the Resilience of Automotive Intrusion Detection Systems to Adversarial Manipulation

Stefano Longari, Paolo Cerracchio, Michele Carminati et al.

The security of modern vehicles has become increasingly important, with the controller area network (CAN) bus serving as a critical communication backbone for various Electronic Control Units (ECUs). The absence of robust security measures in CAN, coupled with the increasing connectivity of vehicles, makes them susceptible to cyberattacks. While intrusion detection systems (IDSs) have been developed to counter such threats, they are not foolproof. Adversarial attacks, particularly evasion attacks, can manipulate inputs to bypass detection by IDSs. This paper extends our previous work by investigating the feasibility and impact of gradient-based adversarial attacks performed with different degrees of knowledge against automotive IDSs. We consider three scenarios: white-box (attacker with full system knowledge), grey-box (partial system knowledge), and the more realistic black-box (no knowledge of the IDS' internal workings or data). We evaluate the effectiveness of the proposed attacks against state-of-the-art IDSs on two publicly available datasets. Additionally, we study effect of the adversarial perturbation on the attack impact and evaluate real-time feasibility by precomputing evasive payloads for timed injection based on bus traffic. Our results demonstrate that, besides attacks being challenging due to the automotive domain constraints, their effectiveness is strongly dependent on the dataset quality, the target IDS, and the attacker's degree of knowledge.

CRJun 9, 2025
TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems

Marco Di Gennaro, Giovanni De Lucia, Stefano Longari et al.

Federated Learning has emerged as a privacy-oriented alternative to centralized Machine Learning, enabling collaborative model training without direct data sharing. While extensively studied for neural networks, the security and privacy implications of tree-based models remain underexplored. This work introduces TimberStrike, an optimization-based dataset reconstruction attack targeting horizontally federated tree-based models. Our attack, carried out by a single client, exploits the discrete nature of decision trees by using split values and decision paths to infer sensitive training data from other clients. We evaluate TimberStrike on State-of-the-Art federated gradient boosting implementations across multiple frameworks, including Flower, NVFlare, and FedTree, demonstrating their vulnerability to privacy breaches. On a publicly available stroke prediction dataset, TimberStrike consistently reconstructs between 73.05% and 95.63% of the target dataset across all implementations. We further analyze Differential Privacy, showing that while it partially mitigates the attack, it also significantly degrades model performance. Our findings highlight the need for privacy-preserving mechanisms specifically designed for tree-based Federated Learning systems, and we provide preliminary insights into their design.

QUANT-PHApr 19, 2021
Quantum algorithms for SVD-based data representation and analysis

Armando Bellante, Alessandro Luongo, Stefano Zanero

This paper narrows the gap between previous literature on quantum linear algebra and practical data analysis on a quantum computer, formalizing quantum procedures that speed-up the solution of eigenproblems for data representations in machine learning. The power and practical use of these subroutines is shown through new quantum algorithms, sublinear in the input matrix's size, for principal component analysis, correspondence analysis, and latent semantic analysis. We provide a theoretical analysis of the run-time and prove tight bounds on the randomized algorithms' error. We run experiments on multiple datasets, simulating PCA's dimensionality reduction for image classification with the novel routines. The results show that the run-time parameters that do not depend on the input's size are reasonable and that the error on the computed model is small, allowing for competitive classification performances.

CRJul 17, 2019
Constrained Concealment Attacks against Reconstruction-based Anomaly Detectors in Industrial Control Systems

Alessandro Erba, Riccardo Taormina, Stefano Galelli et al.

Recently, reconstruction-based anomaly detection was proposed as an effective technique to detect attacks in dynamic industrial control networks. Unlike classical network anomaly detectors that observe the network traffic, reconstruction-based detectors operate on the measured sensor data, leveraging physical process models learned a priori. In this work, we investigate different approaches to evade prior-work reconstruction-based anomaly detectors by manipulating sensor data so that the attack is concealed. We find that replay attacks (commonly assumed to be very strong) show bad performance (i.e., increasing the number of alarms) if the attacker is constrained to manipulate less than 95% of all features in the system, as hidden correlations between the features are not replicated well. To address this, we propose two novel attacks that manipulate a subset of the sensor readings, leveraging learned physical constraints of the system. Our attacks feature two different attacker models: A white box attacker, which uses an optimization approach with a detection oracle, and a black box attacker, which uses an autoencoder to translate anomalous data into normal data. We evaluate our implementation on two different datasets from the water distribution domain, showing that the detector's Recall drops from 0.68 to 0.12 by manipulating 4 sensors out of 82 in WADI dataset. In addition, we show that our black box attacks are transferable to different detectors: They work against autoencoder-, LSTM-, and CNN-based detectors. Finally, we implement and demonstrate our attacks on a real industrial testbed to demonstrate their feasibility in real-time.

CRFeb 19, 2014
PuppetDroid: A User-Centric UI Exerciser for Automatic Dynamic Analysis of Similar Android Applications

Andrea Gianazza, Federico Maggi, Aristide Fattori et al.

Popularity and complexity of malicious mobile applications are rising, making their analysis difficult and labor intensive. Mobile application analysis is indeed inherently different from desktop application analysis: In the latter, the interaction of the user (i.e., victim) is crucial for the malware to correctly expose all its malicious behaviors. We propose a novel approach to analyze (malicious) mobile applications. The goal is to exercise the user interface (UI) of an Android application to effectively trigger malicious behaviors, automatically. Our key intuition is to record and reproduce the UI interactions of a potential victim of the malware, so as to stimulate the relevant behaviors during dynamic analysis. To make our approach scale, we automatically re-execute the recorded UI interactions on apps that are similar to the original ones. These characteristics make our system orthogonal and complementary to current dynamic analysis and UI-exercising approaches. We developed our approach and experimentally shown that our stimulation allows to reach a higher code coverage than automatic UI exercisers, so to unveil interesting malicious behaviors that are not exposed when using other approaches. Our approach is also suitable for crowdsourcing scenarios, which would push further the collection of new stimulation traces. This can potentially change the way we conduct dynamic analysis of (mobile) applications, from fully automatic only, to user-centric and collaborative too.

CRNov 21, 2013
Tracking and Characterizing Botnets Using Automatically Generated Domains

Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro et al.

Modern botnets rely on domain-generation algorithms (DGAs) to build resilient command-and-control infrastructures. Recent works focus on recognizing automatically generated domains (AGDs) from DNS traffic, which potentially allows to identify previously unknown AGDs to hinder or disrupt botnets' communication capabilities. The state-of-the-art approaches require to deploy low-level DNS sensors to access data whose collection poses practical and privacy issues, making their adoption problematic. We propose a mechanism that overcomes the above limitations by analyzing DNS traffic data through a combination of linguistic and IP-based features of suspicious domains. In this way, we are able to identify AGD names, characterize their DGAs and isolate logical groups of domains that represent the respective botnets. Moreover, our system enriches these groups with new, previously unknown AGD names, and produce novel knowledge about the evolving behavior of each tracked botnet. We used our system in real-world settings, to help researchers that requested intelligence on suspicious domains and were able to label them as belonging to the correct botnet automatically. Additionally, we ran an evaluation on 1,153,516 domains, including AGDs from both modern (e.g., Bamital) and traditional (e.g., Conficker, Torpig) botnets. Our approach correctly isolated families of AGDs that belonged to distinct DGAs, and set automatically generated from non-automatically generated domains apart in 94.8 percent of the cases.