Alessandro De Palma

LG
h-index20
13papers
361citations
Novelty52%
AI Score45

13 Papers

LGJun 29, 2022
IBP Regularization for Verified Adversarial Robustness via Branch-and-Bound

Alessandro De Palma, Rudy Bunel, Krishnamurthy Dvijotham et al.

Recent works have tried to increase the verifiability of adversarially trained networks by running the attacks over domains larger than the original perturbations and adding various regularization terms to the objective. However, these algorithms either underperform or require complex and expensive stage-wise training procedures, hindering their practical applicability. We present IBP-R, a novel verified training algorithm that is both simple and effective. IBP-R induces network verifiability by coupling adversarial attacks on enlarged domains with a regularization term, based on inexpensive interval bound propagation, that minimizes the gap between the non-convex verification problem and its approximations. By leveraging recent branch-and-bound frameworks, we show that IBP-R obtains state-of-the-art verified robustness-accuracy trade-offs for small perturbations on CIFAR-10 while training significantly faster than relevant previous work. Additionally, we present UPB, a novel branching strategy that, relying on a simple heuristic based on $β$-CROWN, reduces the cost of state-of-the-art branching algorithms while yielding splits of comparable quality.

LGAug 23, 2024
Verification of Geometric Robustness of Neural Networks via Piecewise Linear Approximation and Lipschitz Optimisation

Ben Batten, Yang Zheng, Alessandro De Palma et al.

We address the problem of verifying neural networks against geometric transformations of the input image, including rotation, scaling, shearing, and translation. The proposed method computes provably sound piecewise linear constraints for the pixel values by using sampling and linear approximations in combination with branch-and-bound Lipschitz optimisation. The method obtains provably tighter over-approximations of the perturbation region than the present state-of-the-art. We report results from experiments on a comprehensive set of verification benchmarks on MNIST and CIFAR10. We show that our proposed implementation resolves up to 32% more verification cases than present approaches.

LGFeb 24, 2020Code
Lagrangian Decomposition for Neural Network Verification

Rudy Bunel, Alessandro De Palma, Alban Desmaison et al.

A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. Previous methods have either used off-the-shelf solvers, discarding the problem structure, or relaxed the problem even further, making the bounds unnecessarily loose. We propose a novel approach based on Lagrangian Decomposition. Our formulation admits an efficient supergradient ascent algorithm, as well as an improved proximal algorithm. Both the algorithms offer three advantages: (i) they yield bounds that are provably at least as tight as previous dual algorithms relying on Lagrangian relaxations; (ii) they are based on operations analogous to forward/backward pass of neural networks layers and are therefore easily parallelizable, amenable to GPU implementation and able to take advantage of the convolutional structure of problems; and (iii) they allow for anytime stopping while still providing valid bounds. Empirically, we show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time, and obtain tighter bounds in the same time as previous dual algorithms. This results in an overall speed-up when employing the bounds for formal verification. Code for our algorithms is available at https://github.com/oval-group/decomposition-plnn-bounds.

LGMay 7, 2024
Verified Neural Compressed Sensing

Rudy Bunel, Krishnamurthy Dvijotham, M. Pawan Kumar et al. · deepmind

We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task, with the proof of correctness generated by an automated verification algorithm without any human input. Prior work on neural network verification has focused on partial specifications that, even when satisfied, are not sufficient to ensure that a neural network never makes errors. We focus on applying neural network verification to computational tasks with a precise notion of correctness, where a verifiably correct neural network provably solves the task at hand with no caveats. In particular, we develop an approach to train and verify the first provably correct neural networks for compressed sensing, i.e., recovering sparse vectors from a number of measurements smaller than the dimension of the vector. We show that for modest problem dimensions (up to 50), we can train neural networks that provably recover a sparse vector from linear and binarized linear measurements. Furthermore, we show that the complexity of the network (number of neurons/layers) can be adapted to the problem difficulty and solve problems where traditional compressed sensing methods are not known to provably work.

LGFeb 2
Learning Better Certified Models from Empirically-Robust Teachers

Alessandro De Palma

Adversarial training attains strong empirical robustness to specific adversarial attacks by training on concrete adversarial perturbations, but it produces neural networks that are not amenable to strong robustness certificates through neural network verification. On the other hand, earlier certified training schemes directly train on bounds from network relaxations to obtain models that are certifiably robust, but display sub-par standard performance. Recent work has shown that state-of-the-art trade-offs between certified robustness and standard performance can be obtained through a family of losses combining adversarial outputs and neural network bounds. Nevertheless, differently from empirical robustness, verifiability still comes at a significant cost in standard performance. In this work, we propose to leverage empirically-robust teachers to improve the performance of certifiably-robust models through knowledge distillation. Using a versatile feature-space distillation objective, we show that distillation from adversarially-trained teachers consistently improves on the state-of-the-art in certified training for ReLU networks across a series of robust computer vision benchmarks.

LGNov 28, 2025
Faster Verified Explanations for Neural Networks

Alessandro De Palma, Greta Dolcetti, Caterina Urban

Verified explanations are a theoretically-principled way to explain the decisions taken by neural networks, which are otherwise black-box in nature. However, these techniques face significant scalability challenges, as they require multiple calls to neural network verifiers, each of them with an exponential worst-case complexity. We present FaVeX, a novel algorithm to compute verified explanations. FaVeX accelerates the computation by dynamically combining batch and sequential processing of input features, and by reusing information from previous queries, both when proving invariances with respect to certain input features, and when searching for feature assignments altering the prediction. Furthermore, we present a novel and hierarchical definition of verified explanations, termed verifier-optimal robust explanations, that explicitly factors the incompleteness of network verifiers within the explanation. Our comprehensive experimental evaluation demonstrates the superior scalability of both FaVeX, and of verifier-optimal robust explanations, which together can produce meaningful formal explanation on networks with hundreds of thousands of non-linear activations.

LGMay 23, 2023
Expressive Losses for Verified Robustness via Convex Combinations

Alessandro De Palma, Rudy Bunel, Krishnamurthy Dvijotham et al.

In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypothesize that the expressivity of a loss function, which we formalize as the ability to span a range of trade-offs between lower and upper bounds to the worst-case loss through a single parameter (the over-approximation coefficient), is key to attaining state-of-the-art performance. To support our hypothesis, we show that trivial expressive losses, obtained via convex combinations between adversarial attacks and IBP bounds, yield state-of-the-art results across a variety of settings in spite of their conceptual simplicity. We provide a detailed analysis of the relationship between the over-approximation coefficient and performance profiles across different expressive losses, showing that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.

LGJan 11, 2022
In Defense of the Unitary Scalarization for Deep Multi-Task Learning

Vitaly Kurin, Alessandro De Palma, Ilya Kostrikov et al.

Recent multi-task learning research argues against unitary scalarization, where training simply minimizes the sum of the task losses. Several ad-hoc multi-task optimization algorithms have instead been proposed, inspired by various hypotheses about what makes multi-task settings difficult. The majority of these optimizers require per-task gradients, and introduce significant memory, runtime, and implementation overhead. We show that unitary scalarization, coupled with standard regularization and stabilization techniques from single-task learning, matches or improves upon the performance of complex multi-task optimizers in popular supervised and reinforcement learning settings. We then present an analysis suggesting that many specialized multi-task optimizers can be partly interpreted as forms of regularization, potentially explaining our surprising results. We believe our results call for a critical reevaluation of recent research in the area.

LGApr 14, 2021
Improved Branch and Bound for Neural Network Verification via Lagrangian Decomposition

Alessandro De Palma, Rudy Bunel, Alban Desmaison et al.

We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks. First, we propose novel bounding algorithms based on Lagrangian Decomposition. Previous works have used off-the-shelf solvers to solve relaxations at each node of the BaB tree, or constructed weaker relaxations that can be solved efficiently, but lead to unnecessarily weak bounds. Our formulation restricts the optimization to a subspace of the dual domain that is guaranteed to contain the optimum, resulting in accelerated convergence. Furthermore, it allows for a massively parallel implementation, which is amenable to GPU acceleration via modern deep learning frameworks. Second, we present a novel activation-based branching strategy. By coupling an inexpensive heuristic with fast dual bounding, our branching scheme greatly reduces the size of the BaB tree compared to previous heuristic methods. Moreover, it performs competitively with a recent strategy based on learning algorithms, without its large offline training cost. Finally, we design a BaB framework, named Branch and Dual Network Bound (BaDNB), based on our novel bounding and branching algorithms. We show that BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial robustness properties.

LGJan 14, 2021
Scaling the Convex Barrier with Sparse Dual Algorithms

Alessandro De Palma, Harkirat Singh Behl, Rudy Bunel et al.

Tight and efficient neural network bounding is crucial to the scaling of neural network verification systems. Many efficient bounding algorithms have been presented recently, but they are often too loose to verify more challenging properties. This is due to the weakness of the employed relaxation, which is usually a linear program of size linear in the number of neurons. While a tighter linear relaxation for piecewise-linear activations exists, it comes at the cost of exponentially many constraints and currently lacks an efficient customized solver. We alleviate this deficiency by presenting two novel dual algorithms: one operates a subgradient method on a small active set of dual variables, the other exploits the sparsity of Frank-Wolfe type optimizers to incur only a linear memory cost. Both methods recover the strengths of the new relaxation: tightness and a linear separation oracle. At the same time, they share the benefits of previous dual approaches for weaker relaxations: massive parallelism, GPU implementation, low cost per iteration and valid bounds at any time. As a consequence, we can obtain better bounds than off-the-shelf solvers in only a fraction of their running time, attaining significant formal verification speed-ups.

LGMar 22, 2019
Sampling Acquisition Functions for Batch Bayesian Optimization

Alessandro De Palma, Celestine Mendler-Dünner, Thomas Parnell et al.

We present Acquisition Thompson Sampling (ATS), a novel technique for batch Bayesian Optimization (BO) based on the idea of sampling multiple acquisition functions from a stochastic process. We define this process through the dependency of the acquisition functions on a set of model hyper-parameters. ATS is conceptually simple, straightforward to implement and, unlike other batch BO methods, it can be employed to parallelize any sequential acquisition function or to make existing parallel methods scale further. We present experiments on a variety of benchmark functions and on the hyper-parameter optimization of a popular gradient boosting tree algorithm. These demonstrate the advantages of ATS with respect to classical parallel Thompson Sampling for BO, its competitiveness with two state-of-the-art batch BO methods, and its effectiveness if applied to existing parallel BO algorithms.

LGSep 12, 2018
Benchmarking and Optimization of Gradient Boosting Decision Tree Algorithms

Andreea Anghel, Nikolaos Papandreou, Thomas Parnell et al.

Gradient boosting decision trees (GBDTs) have seen widespread adoption in academia, industry and competitive data science due to their state-of-the-art performance in many machine learning tasks. One relative downside to these models is the large number of hyper-parameters that they expose to the end-user. To maximize the predictive power of GBDT models, one must either manually tune the hyper-parameters, or utilize automated techniques such as those based on Bayesian optimization. Both of these approaches are time-consuming since they involve repeatably training the model for different sets of hyper-parameters. A number of software GBDT packages have started to offer GPU acceleration which can help to alleviate this problem. In this paper, we consider three such packages: XGBoost, LightGBM and Catboost. Firstly, we evaluate the performance of the GPU acceleration provided by these packages using large-scale datasets with varying shapes, sparsities and learning tasks. Then, we compare the packages in the context of hyper-parameter optimization, both in terms of how quickly each package converges to a good validation score, and in terms of generalization performance.

LGDec 1, 2017
Distributed Stratified Locality Sensitive Hashing for Critical Event Prediction in the Cloud

Alessandro De Palma, Erik Hemberg, Una-May O'Reilly

The availability of massive healthcare data repositories calls for efficient tools for data-driven medicine. We introduce a distributed system for Stratified Locality Sensitive Hashing to perform fast similarity-based prediction on large medical waveform datasets. Our implementation, for an ICU use case, prioritizes latency over throughput and is targeted at a cloud environment. We demonstrate our system on Acute Hypotensive Episode prediction from Arterial Blood Pressure waveforms. On a dataset of $1.37$ million points, we show scaling up to $40$ processors and a $21\times$ speedup in number of comparisons to parallel exhaustive search at the price of a $10\%$ Matthews correlation coefficient (MCC) loss. Furthermore, if additional MCC loss can be tolerated, our system achieves speedups up to two orders of magnitude.