Filippo Leveni

h-index54

6papers

10citations

Novelty52%

AI Score30

Ranked #146,417 of 201,326 authors (top 73%)#32,232 in LG (top 76%)

6 Papers

LGMay 15, 2025

PIF: Anomaly detection via preference embedding

Filippo Leveni, Luca Magri, Giacomo Boracchi et al.

We address the problem of detecting anomalies with respect to structured patterns. To this end, we conceive a novel anomaly detection method called PIF, that combines the advantages of adaptive isolation methods with the flexibility of preference embedding. Specifically, we propose to embed the data in a high dimensional space where an efficient tree-based method, PI-Forest, is employed to compute an anomaly score. Experiments on synthetic and real datasets demonstrate that PIF favorably compares with state-of-the-art anomaly detection techniques, and confirm that PI-Forest is better at measuring arbitrary distances and isolate points in the preference space.

LGMay 14, 2025

Online Isolation Forest

Filippo Leveni, Guilherme Weigert Cassales, Bernhard Pfahringer et al.

The anomaly detection literature is abundant with offline methods, which require repeated access to data in memory, and impose impractical assumptions when applied to a streaming context. Existing online anomaly detection methods also generally fail to address these constraints, resorting to periodic retraining to adapt to the online context. We propose Online-iForest, a novel method explicitly designed for streaming conditions that seamlessly tracks the data generating process as it evolves over time. Experimental validation on real-world datasets demonstrated that Online-iForest is on par with online alternatives and closely rivals state-of-the-art offline anomaly detection techniques that undergo periodic retraining. Notably, Online-iForest consistently outperforms all competitors in terms of efficiency, making it a promising solution in applications where fast identification of anomalies is of primary importance such as cybersecurity, fraud and fault detection.

CRMay 19, 2025

Malware families discovery via Open-Set Recognition on Android manifest permissions

Filippo Leveni, Matteo Mistura, Francesco Iubatti et al.

Malware are malicious programs that are grouped into families based on their penetration technique, source code, and other characteristics. Classifying malware programs into their respective families is essential for building effective defenses against cyber threats. Machine learning models have a huge potential in malware detection on mobile devices, as malware families can be recognized by classifying permission data extracted from Android manifest files. Still, the malware classification task is challenging due to the high-dimensional nature of permission data and the limited availability of training samples. In particular, the steady emergence of new malware families makes it impossible to acquire a comprehensive training set covering all the malware classes. In this work, we present a malware classification system that, on top of classifying known malware, detects new ones. In particular, we combine an open-set recognition technique developed within the computer vision community, namely MaxLogit, with a tree-based Gradient Boosting classifier, which is particularly effective in classifying high-dimensional data. Our solution turns out to be very practical, as it can be seamlessly employed in a standard classification workflow, and efficient, as it adds minimal computational overhead. Experiments on public and proprietary datasets demonstrate the potential of our solution, which has been deployed in a business environment.

CVMay 19, 2025

Non-planar Object Detection and Identification by Features Matching and Triangulation Growth

Filippo Leveni

Object detection and identification is surely a fundamental topic in the computer vision field; it plays a crucial role in many applications such as object tracking, industrial robots control, image retrieval, etc. We propose a feature-based approach for detecting and identifying distorted occurrences of a given template in a scene image by incremental grouping of feature matches between the image and the template. For this purpose, we consider the Delaunay triangulation of template features as an useful tool through which to be guided in this iterative approach. The triangulation is treated as a graph and, starting from a single triangle, neighboring nodes are considered and the corresponding features are identified; then matches related to them are evaluated to determine if they are worthy to be grouped. This evaluation is based on local consistency criteria derived from geometric and photometric properties of local features. Our solution allows the identification of the object in situations where geometric models (e.g. homography) does not hold, thus enable the detection of objects such that the template is non planar or when it is planar but appears distorted in the image. We show that our approach performs just as well or better than application of homography-based RANSAC in scenarios in which distortion is nearly absent, while when the deformation becomes relevant our method shows better description performance.

LGMay 19, 2025

Structure-based Anomaly Detection and Clustering

Filippo Leveni

Anomaly detection is a fundamental problem in domains such as healthcare, manufacturing, and cybersecurity. This thesis proposes new unsupervised methods for anomaly detection in both structured and streaming data settings. In the first part, we focus on structure-based anomaly detection, where normal data follows low-dimensional manifolds while anomalies deviate from them. We introduce Preference Isolation Forest (PIF), which embeds data into a high-dimensional preference space via manifold fitting, and isolates outliers using two variants: Voronoi-iForest, based on geometric distances, and RuzHash-iForest, leveraging Locality Sensitive Hashing for scalability. We also propose Sliding-PIF, which captures local manifold information for streaming scenarios. Our methods outperform existing techniques on synthetic and real datasets. We extend this to structure-based clustering with MultiLink, a novel method for recovering multiple geometric model families in noisy data. MultiLink merges clusters via a model-aware linkage strategy, enabling robust multi-class structure recovery. It offers key advantages over existing approaches, such as speed, reduced sensitivity to thresholds, and improved robustness to poor initial sampling. The second part of the thesis addresses online anomaly detection in evolving data streams. We propose Online Isolation Forest (Online-iForest), which uses adaptive, multi-resolution histograms and dynamically updates tree structures to track changes over time. It avoids retraining while achieving accuracy comparable to offline models, with superior efficiency for real-time applications. Finally, we tackle anomaly detection in cybersecurity via open-set recognition for malware classification. We enhance a Gradient Boosting classifier with MaxLogit to detect unseen malware families, a method now integrated into Cleafy's production system.

LGOct 17, 2024

Change Detection in Multivariate data streams: Online Analysis with Kernel-QuantTree

Michelangelo Olmo Nogara Notarianni, Filippo Leveni, Diego Stucchi et al.

We present Kernel-QuantTree Exponentially Weighted Moving Average (KQT-EWMA), a non-parametric change-detection algorithm that combines the Kernel-QuantTree (KQT) histogram and the EWMA statistic to monitor multivariate data streams online. The resulting monitoring scheme is very flexible, since histograms can be used to model any stationary distribution, and practical, since the distribution of test statistics does not depend on the distribution of datastream in stationary conditions (non-parametric monitoring). KQT-EWMA enables controlling false alarms by operating at a pre-determined Average Run Length ($ARL_0$), which measures the expected number of stationary samples to be monitored before triggering a false alarm. The latter peculiarity is in contrast with most non-parametric change-detection tests, which rarely can control the $ARL_0$ a priori. Our experiments on synthetic and real-world datasets demonstrate that KQT-EWMA can control $ARL_0$ while achieving detection delays comparable to or lower than state-of-the-art methods designed to work in the same conditions.