Giacomo Boracchi

CV
h-index54
30papers
385citations
Novelty53%
AI Score56

30 Papers

CVJul 22, 2024Code
AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection

Yunkang Cao, Jiangning Zhang, Luca Frittoli et al.

Zero-shot anomaly detection (ZSAD) targets the identification of anomalies within images from arbitrary novel categories. This study introduces AdaCLIP for the ZSAD task, leveraging a pre-trained vision-language model (VLM), CLIP. AdaCLIP incorporates learnable prompts into CLIP and optimizes them through training on auxiliary annotated anomaly detection data. Two types of learnable prompts are proposed: static and dynamic. Static prompts are shared across all images, serving to preliminarily adapt CLIP for ZSAD. In contrast, dynamic prompts are generated for each test image, providing CLIP with dynamic adaptation capabilities. The combination of static and dynamic prompts is referred to as hybrid prompts, and yields enhanced ZSAD performance. Extensive experiments conducted across 14 real-world anomaly detection datasets from industrial and medical domains indicate that AdaCLIP outperforms other ZSAD methods and can generalize better to different categories and even domains. Finally, our analysis highlights the importance of diverse auxiliary data and optimized prompts for enhanced generalization capacity. Code is available at https://github.com/caoyunkang/AdaCLIP.

CVSep 23, 2022Code
Composite Convolution: a Flexible Operator for Deep Learning on 3D Point Clouds

Alberto Floris, Luca Frittoli, Diego Carrera et al.

Deep neural networks require specific layers to process point clouds, as the scattered and irregular location of 3D points prevents the use of conventional convolutional filters. We introduce the composite layer, a flexible and general alternative to the existing convolutional operators that process 3D point clouds. We design our composite layer to extract and compress the spatial information from the 3D coordinates of points and then combine this with the feature vectors. Compared to mainstream point-convolutional layers such as ConvPoint and KPConv, our composite layer guarantees greater flexibility in network design and provides an additional form of regularization. To demonstrate the generality of our composite layers, we define both a convolutional composite layer and an aggregate version that combines spatial information and features in a nonlinear manner, and we use these layers to implement CompositeNets. Our experiments on synthetic and real-world datasets show that, in both classification, segmentation, and anomaly detection, our CompositeNets outperform ConvPoint, which uses the same sequential architecture, and achieve similar results as KPConv, which has a deeper, residual architecture. Moreover, our CompositeNets achieve state-of-the-art performance in anomaly detection on point clouds. Our code is publicly available at \url{https://github.com/sirolf-otrebla/CompositeNet}.

LGApr 20, 2022
Adversarial Scratches: Deployable Attacks to CNN Classifiers

Loris Giulivi, Malhar Jere, Loris Rossi et al.

A growing body of work has shown that deep neural networks are susceptible to adversarial examples. These take the form of small perturbations applied to the model's input which lead to incorrect predictions. Unfortunately, most literature focuses on visually imperceivable perturbations to be applied to digital images that often are, by design, impossible to be deployed to physical targets. We present Adversarial Scratches: a novel L0 black-box attack, which takes the form of scratches in images, and which possesses much greater deployability than other state-of-the-art attacks. Adversarial Scratches leverage Bézier Curves to reduce the dimension of the search space and possibly constrain the attack to a specific location. We test Adversarial Scratches in several scenarios, including a publicly available API and images of traffic signs. Results show that, often, our attack achieves higher fooling rate than other deployable state-of-the-art methods, while requiring significantly fewer queries and modifying very few pixels.

66.5CVMay 21Code
Cell Phantom Video Generation in Elliptical Fourier Descriptor Domain

Francesco Benedetto, Roberto Basla, Luca Magri et al.

Training Deep Neural Networks for tracking individual cells in biomedical videos requires a large amount of annotated data. The annotation of videos for cell tracking is very time consuming and often requires domain expertise; this explains the limited availability of public annotated data to address important medical problems like tissue repair or cancer treatment. Generating synthetic videos along with their Ground Truth annotations is a promising solution that relies, as a foundational first step, on the synthesis of single cell annotations (or phantoms). Phantoms need to be time consistent, as they have to replicate biological processes that are specific to the cell types. In this work, we propose a novel framework for generating videos of cell phantoms in the Elliptical Fourier Descriptors (EFDs) domain, a compact and geometrically interpretable representation for 2D closed contours. We represent the cell phantom evolution as a multivariate time series of EFD coefficients, introducing a strong prior for cell morphology and enabling the efficient generation of sequences that evolve coherently in time. Our experimental validation proves that modelling the temporal evolution in EFD space enables the generation of biologically plausible phantom videos. Our method can be used in generative pipelines for synthesizing annotated data for cell tracking, thus strongly mitigating the annotation effort for creating new datasets. Our code is available for download here: https://github.com/FrancescoBenedetto99/efd-cell-video-gen.

CVAug 30, 2022
Deep Open-Set Recognition for Silicon Wafer Production Monitoring

Luca Frittoli, Diego Carrera, Beatrice Rossi et al.

The chips contained in any electronic device are manufactured over circular silicon wafers, which are monitored by inspection machines at different production stages. Inspection machines detect and locate any defect within the wafer and return a Wafer Defect Map (WDM), i.e., a list of the coordinates where defects lie, which can be considered a huge, sparse, and binary image. In normal conditions, wafers exhibit a small number of randomly distributed defects, while defects grouped in specific patterns might indicate known or novel categories of failures in the production line. Needless to say, a primary concern of semiconductor industries is to identify these patterns and intervene as soon as possible to restore normal production conditions. Here we address WDM monitoring as an open-set recognition problem to accurately classify WDM in known categories and promptly detect novel patterns. In particular, we propose a comprehensive pipeline for wafer monitoring based on a Submanifold Sparse Convolutional Network, a deep architecture designed to process sparse data at an arbitrary resolution, which is trained on the known classes. To detect novelties, we define an outlier detector based on a Gaussian Mixture Model fitted on the latent representation of the classifier. Our experiments on a real dataset of WDMs show that directly processing full-resolution WDMs by Submanifold Sparse Convolutions yields superior classification performance on known classes than traditional Convolutional Neural Networks, which require a preliminary binning to reduce the size of the binary images representing WDMs. Moreover, our solution outperforms state-of-the-art open-set recognition solutions in detecting novelties.

CVAug 30, 2022
Deep Autoencoders for Anomaly Detection in Textured Images using CW-SSIM

Andrea Bionda, Luca Frittoli, Giacomo Boracchi

Detecting anomalous regions in images is a frequently encountered problem in industrial monitoring. A relevant example is the analysis of tissues and other products that in normal conditions conform to a specific texture, while defects introduce changes in the normal pattern. We address the anomaly detection problem by training a deep autoencoder, and we show that adopting a loss function based on Complex Wavelet Structural Similarity (CW-SSIM) yields superior detection performance on this type of images compared to traditional autoencoder loss functions. Our experiments on well-known anomaly detection benchmarks show that a simple model trained with this loss function can achieve comparable or superior performance to state-of-the-art methods leveraging deeper, larger and more computationally demanding neural networks.

LGAug 30, 2022
Nonparametric and Online Change Detection in Multivariate Datastreams using QuantTree

Luca Frittoli, Diego Carrera, Giacomo Boracchi

We address the problem of online change detection in multivariate datastreams, and we introduce QuantTree Exponentially Weighted Moving Average (QT-EWMA), a nonparametric change-detection algorithm that can control the expected time before a false alarm, yielding a desired Average Run Length (ARL$_0$). Controlling false alarms is crucial in many applications and is rarely guaranteed by online change-detection algorithms that can monitor multivariate datastreams without knowing the data distribution. Like many change-detection algorithms, QT-EWMA builds a model of the data distribution, in our case a QuantTree histogram, from a stationary training set. To monitor datastreams even when the training set is extremely small, we propose QT-EWMA-update, which incrementally updates the QuantTree histogram during monitoring, always keeping the ARL$_0$ under control. Our experiments, performed on synthetic and real-world datastreams, demonstrate that QT-EWMA and QT-EWMA-update control the ARL$_0$ and the false alarm rate better than state-of-the-art methods operating in similar conditions, achieving lower or comparable detection delays.

CVApr 21, 2022
Perception Visualization: Seeing Through the Eyes of a DNN

Loris Giulivi, Mark James Carman, Giacomo Boracchi

Artificial intelligence (AI) systems power the world we live in. Deep neural networks (DNNs) are able to solve tasks in an ever-expanding landscape of scenarios, but our eagerness to apply these powerful models leads us to focus on their performance and deprioritises our ability to understand them. Current research in the field of explainable AI tries to bridge this gap by developing various perturbation or gradient-based explanation techniques. For images, these techniques fail to fully capture and convey the semantic information needed to elucidate why the model makes the predictions it does. In this work, we develop a new form of explanation that is radically different in nature from current explanation methods, such as Grad-CAM. Perception visualization provides a visual representation of what the DNN perceives in the input image by depicting what visual patterns the latent representation corresponds to. Visualizations are obtained through a reconstruction model that inverts the encoded features, such that the parameters and predictions of the original models are not modified. Results of our user study demonstrate that humans can better understand and predict the system's decisions when perception visualizations are available, thus easing the debugging and deployment of deep models as trusted systems.

LGOct 16, 2022
Class Distribution Monitoring for Concept Drift Detection

Diego Stucchi, Luca Frittoli, Giacomo Boracchi

We introduce Class Distribution Monitoring (CDM), an effective concept-drift detection scheme that monitors the class-conditional distributions of a datastream. In particular, our solution leverages multiple instances of an online and nonparametric change-detection algorithm based on QuantTree. CDM reports a concept drift after detecting a distribution change in any class, thus identifying which classes are affected by the concept drift. This can be precious information for diagnostics and adaptation. Our experiments on synthetic and real-world datastreams show that when the concept drift affects a few classes, CDM outperforms algorithms monitoring the overall data distribution, while achieving similar detection delays when the drift affects all the classes. Moreover, CDM outperforms comparable approaches that monitor the classification error, particularly when the change is not very apparent. Finally, we demonstrate that CDM inherits the properties of the underlying change detector, yielding an effective control over the expected time before a false alarm, or Average Run Length (ARL$_0$).

CVAug 8, 2024
Deep Learning for identifying systolic complexes in SCG traces: a cross-dataset analysis

Michele Craighero, Sarah Solbiati, Federica Mozzini et al.

The seismocardiographic signal is a promising alternative to the traditional ECG in the analysis of the cardiac activity. In particular, the systolic complex is known to be the most informative part of the seismocardiogram, thus requiring further analysis. State-of-art solutions to detect the systolic complex are based on Deep Learning models, which have been proven effective in pioneering studies. However, these solutions have only been tested in a controlled scenario considering only clean signals acquired from users maintained still in supine position. On top of that, all these studies consider data coming from a single dataset, ignoring the benefits and challenges related to a cross-dataset scenario. In this work, a cross-dataset experimental analysis was performed considering also data from a real-world scenario. Our findings prove the effectiveness of a deep learning solution, while showing the importance of a personalization step to contrast the domain shift, namely a change in data distribution between training and testing data. Finally, we demonstrate the benefits of a multi-channels approach, leveraging the information extracted from both accelerometers and gyroscopes data.

CVJan 14Code
LCF3D: A Robust and Real-Time Late-Cascade Fusion Framework for 3D Object Detection in Autonomous Driving

Carlo Sgaravatti, Riccardo Pieroni, Matteo Corno et al.

Accurately localizing 3D objects like pedestrians, cyclists, and other vehicles is essential in Autonomous Driving. To ensure high detection performance, Autonomous Vehicles complement RGB cameras with LiDAR sensors, but effectively combining these data sources for 3D object detection remains challenging. We propose LCF3D, a novel sensor fusion framework that combines a 2D object detector on RGB images with a 3D object detector on LiDAR point clouds. By leveraging multimodal fusion principles, we compensate for inaccuracies in the LiDAR object detection network. Our solution combines two key principles: (i) late fusion, to reduce LiDAR False Positives by matching LiDAR 3D detections with RGB 2D detections and filtering out unmatched LiDAR detections; and (ii) cascade fusion, to recover missed objects from LiDAR by generating new 3D frustum proposals corresponding to unmatched RGB detections. Experiments show that LCF3D is beneficial for domain generalization, as it turns out to be successful in handling different sensor configurations between training and testing domains. LCF3D achieves significant improvements over LiDAR-based methods, particularly for challenging categories like pedestrians and cyclists in the KITTI dataset, as well as motorcycles and bicycles in nuScenes. Code can be downloaded from: https://github.com/CarloSgaravatti/LCF3D.

CVSep 26, 2025Code
Convolutional Set Transformer

Federico Chinello, Giacomo Boracchi

We introduce the Convolutional Set Transformer (CST), a novel neural architecture designed to process image sets of arbitrary cardinality that are visually heterogeneous yet share high-level semantics - such as a common category, scene, or concept. Existing set-input networks, e.g., Deep Sets and Set Transformer, are limited to vector inputs and cannot directly handle 3D image tensors. As a result, they must be cascaded with a feature extractor, typically a CNN, which encodes images into embeddings before the set-input network can model inter-image relationships. In contrast, CST operates directly on 3D image tensors, performing feature extraction and contextual modeling simultaneously, thereby enabling synergies between the two processes. This design yields superior performance in tasks such as Set Classification and Set Anomaly Detection and further provides native compatibility with CNN explainability methods such as Grad-CAM, unlike competing approaches that remain opaque. Finally, we show that CSTs can be pre-trained on large-scale datasets and subsequently adapted to new domains and tasks through standard Transfer Learning schemes. To support further research, we release CST-15, a CST backbone pre-trained on ImageNet (https://github.com/chinefed/convolutional-set-transformer).

CVNov 15, 2025
One target to align them all: LiDAR, RGB and event cameras extrinsic calibration for Autonomous Driving

Andrea Bertogalli, Giacomo Boracchi, Luca Magri

We present a novel multi-modal extrinsic calibration framework designed to simultaneously estimate the relative poses between event cameras, LiDARs, and RGB cameras, with particular focus on the challenging event camera calibration. Core of our approach is a novel 3D calibration target, specifically designed and constructed to be concurrently perceived by all three sensing modalities. The target encodes features in planes, ChArUco, and active LED patterns, each tailored to the unique characteristics of LiDARs, RGB cameras, and event cameras respectively. This unique design enables a one-shot, joint extrinsic calibration process, in contrast to existing approaches that typically rely on separate, pairwise calibrations. Our calibration pipeline is designed to accurately calibrate complex vision systems in the context of autonomous driving, where precise multi-sensor alignment is critical. We validate our approach through an extensive experimental evaluation on a custom built dataset, recorded with an advanced autonomous driving sensor setup, confirming the accuracy and robustness of our method.

LGMay 15, 2025
PIF: Anomaly detection via preference embedding

Filippo Leveni, Luca Magri, Giacomo Boracchi et al.

We address the problem of detecting anomalies with respect to structured patterns. To this end, we conceive a novel anomaly detection method called PIF, that combines the advantages of adaptive isolation methods with the flexibility of preference embedding. Specifically, we propose to embed the data in a high dimensional space where an efficient tree-based method, PI-Forest, is employed to compute an anomaly score. Experiments on synthetic and real datasets demonstrate that PIF favorably compares with state-of-the-art anomaly detection techniques, and confirm that PI-Forest is better at measuring arbitrary distances and isolate points in the preference space.

LGMay 14, 2025
Online Isolation Forest

Filippo Leveni, Guilherme Weigert Cassales, Bernhard Pfahringer et al.

The anomaly detection literature is abundant with offline methods, which require repeated access to data in memory, and impose impractical assumptions when applied to a streaming context. Existing online anomaly detection methods also generally fail to address these constraints, resorting to periodic retraining to adapt to the online context. We propose Online-iForest, a novel method explicitly designed for streaming conditions that seamlessly tracks the data generating process as it evolves over time. Experimental validation on real-world datasets demonstrated that Online-iForest is on par with online alternatives and closely rivals state-of-the-art offline anomaly detection techniques that undergo periodic retraining. Notably, Online-iForest consistently outperforms all competitors in terms of efficiency, making it a promising solution in applications where fast identification of anomalies is of primary importance such as cybersecurity, fraud and fault detection.

CVMay 23, 2024
Explaining Multi-modal Large Language Models by Analyzing their Vision Perception

Loris Giulivi, Giacomo Boracchi

Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in understanding and generating content across various modalities, such as images and text. However, their interpretability remains a challenge, hindering their adoption in critical applications. This research proposes a novel approach to enhance the interpretability of MLLMs by focusing on the image embedding component. We combine an open-world localization model with a MLLM, thus creating a new architecture able to simultaneously produce text and object localization outputs from the same vision embedding. The proposed architecture greatly promotes interpretability, enabling us to design a novel saliency map to explain any output token, to identify model hallucinations, and to assess model biases through semantic adversarial perturbations.

CVApr 25, 2025
A Multimodal Hybrid Late-Cascade Fusion Network for Enhanced 3D Object Detection

Carlo Sgaravatti, Roberto Basla, Riccardo Pieroni et al.

We present a new way to detect 3D objects from multimodal inputs, leveraging both LiDAR and RGB cameras in a hybrid late-cascade scheme, that combines an RGB detection network and a 3D LiDAR detector. We exploit late fusion principles to reduce LiDAR False Positives, matching LiDAR detections with RGB ones by projecting the LiDAR bounding boxes on the image. We rely on cascade fusion principles to recover LiDAR False Negatives leveraging epipolar constraints and frustums generated by RGB detections of separate views. Our solution can be plugged on top of any underlying single-modal detectors, enabling a flexible training process that can take advantage of pre-trained LiDAR and RGB detectors, or train the two branches separately. We evaluate our results on the KITTI object detection benchmark, showing significant performance improvements, especially for the detection of Pedestrians and Cyclists.

CVFeb 3, 2025
Temporal-consistent CAMs for Weakly Supervised Video Segmentation in Waste Sorting

Andrea Marelli, Luca Magri, Federica Arrigoni et al.

In industrial settings, weakly supervised (WS) methods are usually preferred over their fully supervised (FS) counterparts as they do not require costly manual annotations. Unfortunately, the segmentation masks obtained in the WS regime are typically poor in terms of accuracy. In this work, we present a WS method capable of producing accurate masks for semantic segmentation in the case of video streams. More specifically, we build saliency maps that exploit the temporal coherence between consecutive frames in a video, promoting consistency when objects appear in different frames. We apply our method in a waste-sorting scenario, where we perform weakly supervised video segmentation (WSVS) by training an auxiliary classifier that distinguishes between videos recorded before and after a human operator, who manually removes specific wastes from a conveyor belt. The saliency maps of this classifier identify materials to be removed, and we modify the classifier training to minimize differences between the saliency map of a central frame and those in adjacent frames, after having compensated object displacement. Experiments on a real-world dataset demonstrate the benefits of integrating temporal coherence directly during the training phase of the classifier. Code and dataset are available upon request.

CVMay 23, 2024
SE3D: A Framework For Saliency Method Evaluation In 3D Imaging

Mariusz Wiśniewski, Loris Giulivi, Giacomo Boracchi

For more than a decade, deep learning models have been dominating in various 2D imaging tasks. Their application is now extending to 3D imaging, with 3D Convolutional Neural Networks (3D CNNs) being able to process LIDAR, MRI, and CT scans, with significant implications for fields such as autonomous driving and medical imaging. In these critical settings, explaining the model's decisions is fundamental. Despite recent advances in Explainable Artificial Intelligence, however, little effort has been devoted to explaining 3D CNNs, and many works explain these models via inadequate extensions of 2D saliency methods. A fundamental limitation to the development of 3D saliency methods is the lack of a benchmark to quantitatively assess these on 3D data. To address this issue, we propose SE3D: a framework for Saliency method Evaluation in 3D imaging. We propose modifications to ShapeNet, ScanNet, and BraTS datasets, and evaluation metrics to assess saliency methods for 3D CNNs. We evaluate both state-of-the-art saliency methods designed for 3D data and extensions of popular 2D saliency methods to 3D. Our experiments show that 3D saliency methods do not provide explanations of sufficient quality, and that there is margin for future improvements and safer applications of 3D CNNs in critical fields.

CVMay 23, 2024
Concept Visualization: Explaining the CLIP Multi-modal Embedding Using WordNet

Loris Giulivi, Giacomo Boracchi

Advances in multi-modal embeddings, and in particular CLIP, have recently driven several breakthroughs in Computer Vision (CV). CLIP has shown impressive performance on a variety of tasks, yet, its inherently opaque architecture may hinder the application of models employing CLIP as backbone, especially in fields where trust and model explainability are imperative, such as in the medical domain. Current explanation methodologies for CV models rely on Saliency Maps computed through gradient analysis or input perturbation. However, these Saliency Maps can only be computed to explain classes relevant to the end task, often smaller in scope than the backbone training classes. In the context of models implementing CLIP as their vision backbone, a substantial portion of the information embedded within the learned representations is thus left unexplained. In this work, we propose Concept Visualization (ConVis), a novel saliency methodology that explains the CLIP embedding of an image by exploiting the multi-modal nature of the embeddings. ConVis makes use of lexical information from WordNet to compute task-agnostic Saliency Maps for any concept, not limited to concepts the end model was trained on. We validate our use of WordNet via an out of distribution detection experiment, and test ConVis on an object localization benchmark, showing that Concept Visualizations correctly identify and localize the image's semantic content. Additionally, we perform a user study demonstrating that our methodology can give users insight on the model's functioning.

INS-DETJan 19
Accurate Simulation Pipeline for Passive Single-Photon Imaging

Aleksi Suonsivu, Lauri Salmela, Leevi Uosukainen et al.

Single-Photon Avalanche Diodes (SPADs) are new and promising imaging sensors. These sensors are sensitive enough to detect individual photons hitting each pixel, with extreme temporal resolution and without readout noise. Thus, SPADs stand out as an optimal choice for low-light imaging. Due to the high price and limited availability of SPAD sensors, the demand for an accurate data simulation pipeline is substantial. Indeed, the scarcity of SPAD datasets hinders the development of SPAD-specific processing algorithms and impedes the training of learning-based solutions. In this paper, we present a comprehensive SPAD simulation pipeline and validate it with multiple experiments using two recent commercial SPAD sensors. Our simulator is used to generate the SPAD-MNIST, a single-photon version of the seminal MNIST dataset, to investigate the effectiveness of convolutional neural network (CNN) classifiers on reconstructed fluxes, even at extremely low light conditions, e.g., 5 mlux. We also assess the performance of classifiers exclusively trained on simulated data on real images acquired from SPAD sensors at different light conditions. The synthetic dataset encompasses different SPAD imaging modalities and is made available for download. Project page: https://boracchi.faculty.polimi.it/Projects/SPAD-MNIST.html.

CVSep 8, 2025
WS$^2$: Weakly Supervised Segmentation using Before-After Supervision in Waste Sorting

Andrea Marelli, Alberto Foresti, Leonardo Pesce et al.

In industrial quality control, to visually recognize unwanted items within a moving heterogeneous stream, human operators are often still indispensable. Waste-sorting stands as a significant example, where operators on multiple conveyor belts manually remove unwanted objects to select specific materials. To automate this recognition problem, computer vision systems offer great potential in accurately identifying and segmenting unwanted items in such settings. Unfortunately, considering the multitude and the variety of sorting tasks, fully supervised approaches are not a viable option to address this challange, as they require extensive labeling efforts. Surprisingly, weakly supervised alternatives that leverage the implicit supervision naturally provided by the operator in his removal action are relatively unexplored. In this paper, we define the concept of Before-After Supervision, illustrating how to train a segmentation network by leveraging only the visual differences between images acquired \textit{before} and \textit{after} the operator. To promote research in this direction, we introduce WS$^2$ (Weakly Supervised segmentation for Waste-Sorting), the first multiview dataset consisting of more than 11 000 high-resolution video frames captured on top of a conveyor belt, including "before" and "after" images. We also present a robust end-to-end pipeline, used to benchmark several state-of-the-art weakly supervised segmentation methods on WS$^2$.

CRMay 19, 2025
Malware families discovery via Open-Set Recognition on Android manifest permissions

Filippo Leveni, Matteo Mistura, Francesco Iubatti et al.

Malware are malicious programs that are grouped into families based on their penetration technique, source code, and other characteristics. Classifying malware programs into their respective families is essential for building effective defenses against cyber threats. Machine learning models have a huge potential in malware detection on mobile devices, as malware families can be recognized by classifying permission data extracted from Android manifest files. Still, the malware classification task is challenging due to the high-dimensional nature of permission data and the limited availability of training samples. In particular, the steady emergence of new malware families makes it impossible to acquire a comprehensive training set covering all the malware classes. In this work, we present a malware classification system that, on top of classifying known malware, detects new ones. In particular, we combine an open-set recognition technique developed within the computer vision community, namely MaxLogit, with a tree-based Gradient Boosting classifier, which is particularly effective in classifying high-dimensional data. Our solution turns out to be very practical, as it can be seamlessly employed in a standard classification workflow, and efficient, as it adds minimal computational overhead. Experiments on public and proprietary datasets demonstrate the potential of our solution, which has been deployed in a business environment.

CVFeb 10, 2025
A Deep Learning Pipeline for Solid Waste Detection in Remote Sensing Images

Federico Gibellini, Piero Fraternali, Giacomo Boracchi et al.

Improper solid waste management represents both a serious threat to ecosystem health and a significant source of revenues for criminal organizations perpetrating environmental crimes. This issue can be mitigated thanks to the increasing availability of Very-High-Resolution Remote Sensing (VHR RS) images. Modern image-analysis tools support automated photo-interpretation and large territory scanning in search of illegal waste disposal sites. This paper illustrates a semi-automatic waste detection pipeline, developed in collaboration with a regional environmental protection agency, for detecting candidate illegal dumping sites in VHR RS images. To optimize the effectiveness of the waste detector at the core of the pipeline, extensive experiments evaluate such design choices as the network architecture, the ground resolution and geographic span of the input images, as well as the pretraining procedures. The best model attains remarkable performance, achieving 92.02 % F1-Score and 94.56 % Accuracy. A generalization study assesses the performance variation when the detector processes images from various territories substantially different from the one used during training, incurring only a moderate performance loss, namely an average 5.1 % decrease in the F1-Score. Finally, an exercise in which expert photo-interpreters compare the effort required to scan large territories with and without support from the waste detector assesses the practical benefit of introducing a computer-aided image analysis tool in a professional environmental protection agency. Results show that a reduction of up to 30 % of the time spent for waste site detection can be attained.

CVOct 22, 2024
Time-Resolved MNIST Dataset for Single-Photon Recognition

Aleksi Suonsivu, Lauri Salmela, Edoardo Peretti et al.

Time-resolved single photon imaging is a promising imaging modality characterized by the unique capability of timestamping the arrivals of single photons. Single-Photon Avalanche Diodes (SPADs) are the leading technology for implementing modern time-resolved pixels, suitable for passive imaging with asynchronous readout. However, they are currently limited to small sized arrays, thus there is a lack of datasets for passive time-resolved SPAD imaging, which in turn hinders research on this peculiar imaging data. In this paper we describe a realistic simulation process for SPAD imaging, which takes into account both the stochastic nature of photon arrivals and all the noise sources involved in the acquisition process of time-resolved SPAD arrays. We have implemented this simulator in a software prototype able to generate arbitrary-sized time-resolved SPAD arrays operating in passive mode. Starting from a reference image, our simulator generates a realistic stream of timestamped photon detections. We use our simulator to generate a time-resolved version of MNIST, which we make publicly available. Our dataset has the purpose of encouraging novel research directions in time-resolved SPAD imaging, as well as investigating the performance of CNN classifiers in extremely low-light conditions.

LGOct 17, 2024
Change Detection in Multivariate data streams: Online Analysis with Kernel-QuantTree

Michelangelo Olmo Nogara Notarianni, Filippo Leveni, Diego Stucchi et al.

We present Kernel-QuantTree Exponentially Weighted Moving Average (KQT-EWMA), a non-parametric change-detection algorithm that combines the Kernel-QuantTree (KQT) histogram and the EWMA statistic to monitor multivariate data streams online. The resulting monitoring scheme is very flexible, since histograms can be used to model any stationary distribution, and practical, since the distribution of test statistics does not depend on the distribution of datastream in stationary conditions (non-parametric monitoring). KQT-EWMA enables controlling false alarms by operating at a pre-determined Average Run Length ($ARL_0$), which measures the expected number of stationary samples to be monitored before triggering a false alarm. The latter peculiarity is in contrast with most non-parametric change-detection tests, which rarely can control the $ARL_0$ a priori. Our experiments on synthetic and real-world datasets demonstrate that KQT-EWMA can control $ARL_0$ while achieving detection delays comparable to or lower than state-of-the-art methods designed to work in the same conditions.

IVJun 3, 2024
An expert-driven data generation pipeline for histological images

Roberto Basla, Loris Giulivi, Luca Magri et al.

Deep Learning (DL) models have been successfully applied to many applications including biomedical cell segmentation and classification in histological images. These models require large amounts of annotated data which might not always be available, especially in the medical field where annotations are scarce and expensive. To overcome this limitation, we propose a novel pipeline for generating synthetic datasets for cell segmentation. Given only a handful of annotated images, our method generates a large dataset of images which can be used to effectively train DL instance segmentation models. Our solution is designed to generate cells of realistic shapes and placement by allowing experts to incorporate domain knowledge during the generation of the dataset.

GRMay 17, 2023
Extracting a functional representation from a dictionary for non-rigid shape matching

Michele Colombo, Giacomo Boracchi, Simone Melzi

Shape matching is a fundamental problem in computer graphics with many applications. Functional maps translate the point-wise shape-matching problem into its functional counterpart and have inspired numerous solutions over the last decade. Nearly all the solutions based on functional maps rely on the eigenfunctions of the Laplace-Beltrami Operator (LB) to describe the functional spaces defined on the surfaces and then convert the functional correspondences into point-wise correspondences. However, this final step is often error-prone and inaccurate in tiny regions and protrusions, where the energy of LB does not uniformly cover the surface. We propose a new functional basis Principal Components of a Dictionary (PCD) to address such intrinsic limitation. PCD constructs an orthonormal basis from the Principal Component Analysis (PCA) of a dictionary of functions defined over the shape. These dictionaries can target specific properties of the final basis, such as achieving an even spreading of energy. Our experimental evaluation compares seven different dictionaries on established benchmarks, showing that PCD is suited to target different shape-matching scenarios, resulting in more accurate point-wise maps than the LB basis when used in the same pipeline. This evidence provides a promising alternative for improving correspondence estimation, confirming the power and flexibility of functional maps.

NEDec 5, 2019
Scratch that! An Evolution-based Adversarial Attack against Neural Networks

Malhar Jere, Loris Rossi, Briland Hitaj et al.

We study black-box adversarial attacks for image classifiers in a constrained threat model, where adversaries can only modify a small fraction of pixels in the form of scratches on an image. We show that it is possible for adversaries to generate localized \textit{adversarial scratches} that cover less than $5\%$ of the pixels in an image and achieve targeted success rates of $98.77\%$ and $97.20\%$ on ImageNet and CIFAR-10 trained ResNet-50 models, respectively. We demonstrate that our scratches are effective under diverse shapes, such as straight lines or parabolic B\a'ezier curves, with single or multiple colors. In an extreme condition, in which our scratches are a single color, we obtain a targeted attack success rate of $66\%$ on CIFAR-10 with an order of magnitude fewer queries than comparable attacks. We successfully launch our attack against Microsoft's Cognitive Services Image Captioning API and propose various mitigation strategies.

MLOct 16, 2015
Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss

Cesare Alippi, Giacomo Boracchi, Diego Carrera et al.

We address the problem of detecting changes in multivariate datastreams, and we investigate the intrinsic difficulty that change-detection methods have to face when the data dimension scales. In particular, we consider a general approach where changes are detected by comparing the distribution of the log-likelihood of the datastream over different time windows. Despite the fact that this approach constitutes the frame of several change-detection methods, its effectiveness when data dimension scales has never been investigated, which is indeed the goal of our paper. We show that the magnitude of the change can be naturally measured by the symmetric Kullback-Leibler divergence between the pre- and post-change distributions, and that the detectability of a change of a given magnitude worsens when the data dimension increases. This problem, which we refer to as \emph{detectability loss}, is due to the linear relationship between the variance of the log-likelihood and the data dimension. We analytically derive the detectability loss on Gaussian-distributed datastreams, and empirically demonstrate that this problem holds also on real-world datasets and that can be harmful even at low data-dimensions (say, 10).