Pierangelo Lombardo

CR
h-index1
5papers
1,009citations
Novelty42%
AI Score37

5 Papers

MEApr 11, 2025
Standardization of Weighted Ranking Correlation Coefficients

Pierangelo Lombardo

A relevant problem in statistics is defining the correlation of two rankings of a list of items. Kendall's tau and Spearman's rho are two well established correlation coefficients, characterized by a symmetric form that ensures zero expected value between two pairs of rankings randomly chosen with uniform probability. However, in recent years, several weighted versions of the original Spearman and Kendall coefficients have emerged that take into account the greater importance of top ranks compared to low ranks, which is common in many contexts. The weighting schemes break the symmetry, causing a non-zero expected value between two random rankings. This issue is very relevant, as it undermines the concept of uncorrelation between rankings. In this paper, we address this problem by proposing a standardization function $g(x)$ that maps a correlation ranking coefficient $Γ$ in a standard form $g(Γ)$ that has zero expected value, while maintaining the relevant statistical properties of $Γ$.

LGOct 24, 2025
Cost-Sensitive Evaluation for Binary Classifiers

Pierangelo Lombardo, Antonio Casoli, Cristian Cingolani et al.

Selecting an appropriate evaluation metric for classifiers is crucial for model comparison and parameter optimization, yet there is not consensus on a universally accepted metric that serves as a definitive standard. Moreover, there is often a misconception about the perceived need to mitigate imbalance in datasets used to train classification models. Since the final goal in classifier optimization is typically maximizing the return of investment or, equivalently, minimizing the Total Classification Cost (TCC), we define Weighted Accuracy (WA), an evaluation metric for binary classifiers with a straightforward interpretation as a weighted version of the well-known accuracy metric, coherent with the need of minimizing TCC. We clarify the conceptual framework for handling class imbalance in cost-sensitive scenarios, providing an alternative to rebalancing techniques. This framework can be applied to any metric that, like WA, can be expressed as a linear combination of example-dependent quantities and allows for comparing the results obtained in different datasets and for addressing discrepancies between the development dataset, used to train and validate the model, and the target dataset, where the model will be deployed. It also specifies in which scenarios using UCCs-unaware class rebalancing techniques or rebalancing metrics aligns with TCC minimization and when it is instead counterproductive. Finally, we propose a procedure to estimate the WA weight parameter in the absence of fully specified UCCs and demonstrate the robustness of WA by analyzing its correlation with TCC in example-dependent scenarios.

CLOct 9, 2020
Top-Rank-Focused Adaptive Vote Collection for the Evaluation of Domain-Specific Semantic Models

Pierangelo Lombardo, Alessio Boiardi, Luca Colombo et al.

The growth of domain-specific applications of semantic models, boosted by the recent achievements of unsupervised embedding learning algorithms, demands domain-specific evaluation datasets. In many cases, content-based recommenders being a prime example, these models are required to rank words or texts according to their semantic relatedness to a given concept, with particular focus on top ranks. In this work, we give a threefold contribution to address these requirements: (i) we define a protocol for the construction, based on adaptive pairwise comparisons, of a relatedness-based evaluation dataset tailored on the available resources and optimized to be particularly accurate in top-rank evaluation; (ii) we define appropriate metrics, extensions of well-known ranking correlation coefficients, to evaluate a semantic model via the aforementioned dataset by taking into account the greater significance of top ranks. Finally, (iii) we define a stochastic transitivity model to simulate semantic-driven pairwise comparisons, which confirms the effectiveness of the proposed dataset construction protocol.

CROct 4, 2020
DNS Covert Channel Detection via Behavioral Analysis: a Machine Learning Approach

Salvatore Saeli, Federica Bisio, Pierangelo Lombardo et al.

Detecting covert channels among legitimate traffic represents a severe challenge due to the high heterogeneity of networks. Therefore, we propose an effective covert channel detection method, based on the analysis of DNS network data passively extracted from a network monitoring system. The framework is based on a machine learning module and on the extraction of specific anomaly indicators able to describe the problem at hand. The contribution of this paper is two-fold: (i) the machine learning models encompass network profiles tailored to the network users, and not to the single query events, hence allowing for the creation of behavioral profiles and spotting possible deviations from the normal baseline; (ii) models are created in an unsupervised mode, thus allowing for the identification of zero-days attacks and avoiding the requirement of signatures or heuristics for new variants. The proposed solution has been evaluated over a 15-day-long experimental session with the injection of traffic that covers the most relevant exfiltration and tunneling attacks: all the malicious variants were detected, while producing a low false-positive rate during the same period.

CRApr 17, 2018
Fast Flux Service Network Detection via Data Mining on Passive DNS Traffic

Pierangelo Lombardo, Salvatore Saeli, Federica Bisio et al.

In the last decade, the use of fast flux technique has become established as a common practice to organise botnets in Fast Flux Service Networks (FFSNs), which are platforms able to sustain illegal online services with very high availability. In this paper, we report on an effective fast flux detection algorithm based on the passive analysis of the Domain Name System (DNS) traffic of a corporate network. The proposed method is based on the near-real-time identification of different metrics that measure a wide range of fast flux key features; the metrics are combined via a simple but effective mathematical and data mining approach. The proposed solution has been evaluated in a one-month experiment over an enterprise network, with the injection of pcaps associated with different malware campaigns, that leverage FFSNs and cover a wide variety of attack scenarios. An in-depth analysis of a list of fast flux domains confirmed the reliability of the metrics used in the proposed algorithm and allowed for the identification of many IPs that turned out to be part of two notorious FFSNs, namely Dark Cloud and SandiFlux, to the description of which we therefore contribute. All the fast flux domains were detected with a very low false positive rate; a comparison of performance indicators with previous works show a remarkable improvement.