LGJan 30, 2023
Compression, Generalization and LearningMarco C. Campi, Simone Garatti
A compression function is a map that slims down an observational set into a subset of reduced size, while preserving its informational content. In multiple applications, the condition that one new observation makes the compressed set change is interpreted that this observation brings in extra information and, in learning theory, this corresponds to misclassification, or misprediction. In this paper, we lay the foundations of a new theory that allows one to keep control on the probability of change of compression (which maps into the statistical "risk" in learning applications). Under suitable conditions, the cardinality of the compressed set is shown to be a consistent estimator of the probability of change of compression (without any upper limit on the size of the compressed set); moreover, unprecedentedly tight finite-sample bounds to evaluate the probability of change of compression are obtained under a generally applicable condition of preference. All results are usable in a fully agnostic setup, i.e., without requiring any a priori knowledge on the probability distribution of the observations. Not only these results offer a valid support to develop trust in observation-driven methodologies, they also play a fundamental role in learning techniques as a tool for hyper-parameter tuning.
OCMar 15, 2019
The scenario approach meets uncertain variational inequalities and game theoryDario Paccagnan, Marco C. Campi
Variational inequalities are modelling tools used to capture a variety of decision-making problems arising in mathematical optimization, operations research, game theory. The scenario approach is a set of techniques developed to tackle stochastic optimization problems, take decisions based on historical data, and quantify their risk. The overarching goal of this manuscript is to bridge these two areas of research, and thus broaden the class of problems amenable to be studied under the lens of the scenario approach. First and foremost, we provide out-of-samples feasibility guarantees for the solution of variational and quasi variational inequality problems. Second, we apply these results to two classes of uncertain games. In the first class, the uncertainty enters in the constraint sets, while in the second class the uncertainty enters in the cost functions. Finally, we exemplify the quality and relevance of our bounds through numerical simulations on a demand-response model.
28.5MLApr 1
Scenario theory for multi-criteria data-driven decision makingSimone Garatti, Lucrezia Manieri, Alessandro Falsone et al.
The scenario approach provides a powerful data-driven framework for designing solutions under uncertainty with rigorous probabilistic robustness guarantees. Existing theory, however, primarily addresses assessing robustness with respect to a single appropriateness criterion for the solution based on a dataset, whereas many practical applications - including multi-agent decision problems - require the simultaneous consideration of multiple criteria and the assessment of their robustness based on multiple datasets, one per criterion. This paper develops a general scenario theory for multi-criteria data-driven decision making. A central innovation lies in the collective treatment of the risks associated with violations of individual criteria, which yields substantially more accurate robustness certificates than those derived from a naive application of standard results. In turn, this approach enables a sharper quantification of the robustness level with which all criteria are simultaneously satisfied. The proposed framework applies broadly to multi-criteria data-driven decision problems, providing a principled, scalable, and theoretically grounded methodology for design under uncertainty.
SYDec 4, 2025
Pick-to-Learn for Systems and Control: Data-driven Synthesis with State-of-the-art Safety GuaranteesDario Paccagnan, Daniel Marks, Marco C. Campi et al.
Data-driven methods have become paramount in modern systems and control problems characterized by growing levels of complexity. In safety-critical environments, deploying these methods requires rigorous guarantees, a need that has motivated much recent work at the interface of statistical learning and control. However, many existing approaches achieve this goal at the cost of sacrificing valuable data for testing and calibration, or by constraining the choice of learning algorithm, thus leading to suboptimal performances. In this paper, we describe Pick-to-Learn (P2L) for Systems and Control, a framework that allows any data-driven control method to be equipped with state-of-the-art safety and performance guarantees. P2L enables the use of all available data to jointly synthesize and certify the design, eliminating the need to set aside data for calibration or validation purposes. In presenting a comprehensive version of P2L for systems and control, this paper demonstrates its effectiveness across a range of core problems, including optimal control, reachability analysis, safe synthesis, and robust control. In many of these applications, P2L delivers designs and certificates that outperform commonly employed methods, and shows strong potential for broad applicability in diverse practical settings.
MEFeb 17
Scenario Approach with Post-Design Certification of User-Specified PropertiesAlgo Carè, Marco C. Campi, Simone Garatti
The scenario approach is an established data-driven design framework that comes equipped with a powerful theory linking design complexity to generalization properties. In this approach, data are simultaneously used both for design and for certifying the design's reliability, without resorting to a separate test dataset. This paper takes a step further by guaranteeing additional properties, useful in post-design usage but not considered during the design phase. To this end, we introduce a two-level framework of appropriateness: baseline appropriateness, which guides the design process, and post-design appropriateness, which serves as a criterion for a posteriori evaluation. We provide distribution-free upper bounds on the risk of failing to meet the post-design appropriateness; these bounds are computable without using any additional test data. Under additional assumptions, lower bounds are also derived. As part of an effort to demonstrate the usefulness of the proposed methodology, the paper presents two practical examples in H2 and pole-placement problems. Moreover, a method is provided to infer comprehensive distributional knowledge of relevant performance indexes from the available dataset.
55.6MLMay 12
Multi-Variable Conformal Prediction: Optimizing Prediction Sets without Data SplittingLaura Lützow, Simone Garatti, Marco C. Campi et al.
Conformal prediction constructs prediction sets with finite-sample coverage guarantees, but its calibration stage is structurally constrained to a scalar score function and a single threshold variable - forcing shapes of prediction sets to be fixed before calibration, typically through data splitting. We introduce multi-variable conformal prediction (MCP), a framework that extends conformal prediction to vector-valued score functions with multiple simultaneous calibration variables. Building on scenario theory as a principled framework for certifying data-driven decisions, MCP unifies prediction set design and calibration into a single optimization problem, eliminating data splitting without sacrificing coverage guarantees. We propose two computationally efficient variants: RemMCP, grounded in constrained optimization with constraint removal, which admits a clean generalization of split conformal prediction; and RelMCP, based on iterative optimization with constraint relaxation, which supports non-convex score functions at the cost of possibly greater conservatism. Through numerical experiments on ellipsoidal and multi-modal prediction sets, we demonstrate that RemMCP and RelMCP consistently meet the target coverage with prediction set sizes smaller than or comparable to those of baselines with data split, while considerably reducing variance across calibration runs - a direct consequence of using all available data for shape optimization and calibration simultaneously.
LGMay 2, 2025
Risk Analysis and Design Against Adversarial ActionsMarco C. Campi, Algo Carè, Luis G. Crespo et al.
Learning models capable of providing reliable predictions in the face of adversarial actions has become a central focus of the machine learning community in recent years. This challenge arises from observing that data encountered at deployment time often deviate from the conditions under which the model was trained. In this paper, we address deployment-time adversarial actions and propose a versatile, well-principled framework to evaluate the model's robustness against attacks of diverse types and intensities. While we initially focus on Support Vector Regression (SVR), the proposed approach extends naturally to the broad domain of learning via relaxed optimization techniques. Our results enable an assessment of the model vulnerability without requiring additional test data and operate in a distribution-free setup. These results not only provide a tool to enhance trust in the model's applicability but also aid in selecting among competing alternatives. Later in the paper, we show that our findings also offer useful insights for establishing new results within the out-of-distribution framework.
LGApr 13, 2020
A Theory of the Risk for Optimization with Relaxation and its Application to Support Vector MachinesMarco C. Campi, Simone Garatti
In this paper we consider optimization with relaxation, an ample paradigm to make data-driven designs. This approach was previously considered by the same authors of this work in Garatti and Campi (2019), a study that revealed a deep-seated connection between two concepts: risk (probability of not satisfying a new, out-of-sample, constraint) and complexity (according to a definition introduced in paper Garatti and Campi (2019)). This connection was shown to have profound implications in applications because it implied that the risk can be estimated from the complexity, a quantity that can be measured from the data without any knowledge of the data-generation mechanism. In the present work we establish new results. First, we expand the scope of Garatti and Campi (2019) so as to embrace a more general setup that covers various algorithms in machine learning. Then, we study classical support vector methods - including SVM (Support Vector Machine), SVR (Support Vector Regression) and SVDD (Support Vector Data Description) - and derive new results for the ability of these methods to generalize. All results are valid for any finite size of the data set. When the sample size tends to infinity, we establish the unprecedented result that the risk approaches the ratio between the complexity and the cardinality of the data sample, regardless of the value of the complexity.
SPJul 22, 2018
Sign-Perturbed Sums: A New System Identification Approach for Constructing Exact Non-Asymptotic Confidence Regions in Linear Regression ModelsBalázs Cs. Csáji, Marco C. Campi, Erik Weyer
We propose a new system identification method, called Sign-Perturbed Sums (SPS), for constructing non-asymptotic confidence regions under mild statistical assumptions. SPS is introduced for linear regression models, including but not limited to FIR systems, and we show that the SPS confidence regions have exact confidence probabilities, i.e., they contain the true parameter with a user-chosen exact probability for any finite data set. Moreover, we also prove that the SPS regions are star convex with the Least-Squares (LS) estimate as a star center. The main assumptions of SPS are that the noise terms are independent and symmetrically distributed about zero, but they can be nonstationary, and their distributions need not be known. The paper also proposes a computationally efficient ellipsoidal outer approximation algorithm for SPS. Finally, SPS is demonstrated through a number of simulation experiments.