Carlos Cotrini

CR
h-index76
4papers
39citations
Novelty56%
AI Score39

4 Papers

LGDec 21, 2023Code
Invariant Anomaly Detection under Distribution Shifts: A Causal Perspective

João B. S. Carvalho, Mengtao Zhang, Robin Geyer et al.

Anomaly detection (AD) is the machine learning task of identifying highly discrepant abnormal samples by solely relying on the consistency of the normal training samples. Under the constraints of a distribution shift, the assumption that training samples and test samples are drawn from the same distribution breaks down. In this work, by leveraging tools from causal inference we attempt to increase the resilience of anomaly detection models to different kinds of distribution shifts. We begin by elucidating a simple yet necessary statistical property that ensures invariant representations, which is critical for robust AD under both domain and covariate shifts. From this property, we derive a regularization term which, when minimized, leads to partial distribution invariance across environments. Through extensive experimental evaluation on both synthetic and real-world tasks, covering a range of six different AD methods, we demonstrated significant improvements in out-of-distribution performance. Under both covariate and domain shift, models regularized with our proposed term showed marked increased robustness. Code is available at: https://github.com/JoaoCarv/invariant-anomaly-detection.

CRSep 21, 2023
S-BDT: Distributed Differentially Private Boosted Decision Trees

Thorsten Peinemann, Moritz Kirschte, Joshua Stock et al.

We introduce S-BDT: a novel $(\varepsilon,δ)$-differentially private distributed gradient boosted decision tree (GBDT) learner that improves the protection of single training data points (privacy) while achieving meaningful learning goals, such as accuracy or regression error (utility). S-BDT uses less noise by relying on non-spherical multivariate Gaussian noise, for which we show tight subsampling bounds for privacy amplification and incorporate that into a Rényi filter for individual privacy accounting. We experimentally reach the same utility while saving $50\%$ in terms of epsilon for $\varepsilon \le 0.5$ on the Abalone regression dataset (dataset size $\approx 4K$), saving $30\%$ in terms of epsilon for $\varepsilon \le 0.08$ for the Adult classification dataset (dataset size $\approx 50K$), and saving $30\%$ in terms of epsilon for $\varepsilon\leq0.03$ for the Spambase classification dataset (dataset size $\approx 5K$). Moreover, we show that for situations where a GBDT is learning a stream of data that originates from different subpopulations (non-IID), S-BDT improves the saving of epsilon even further.

LGMar 20, 2025
Rethinking Robustness in Machine Learning: A Posterior Agreement Approach

João Borges S. Carvalho, Victor Jimenez Rodriguez, Alessandro Torcinovich et al.

The robustness of algorithms against covariate shifts is a fundamental problem with critical implications for the deployment of machine learning algorithms in the real world. Current evaluation methods predominantly measure robustness through the lens of standard generalization, relying on task performance measures like accuracy. This approach lacks a theoretical justification and underscores the need for a principled foundation of robustness assessment under distribution shifts. In this work, we set the desiderata for a robustness measure, and we propose a novel principled framework for the robustness assessment problem that directly follows the Posterior Agreement (PA) theory of model validation. Specifically, we extend the PA framework to the covariate shift setting and propose a measure for robustness evaluation. We assess the soundness of our measure in controlled environments and through an empirical robustness analysis in two different covariate shift scenarios: adversarial learning and domain generalization. We illustrate the suitability of PA by evaluating several models under different nature and magnitudes of shift, and proportion of affected observations. The results show that PA offers a reliable analysis of the vulnerabilities in learning algorithms across different shift conditions and provides higher discriminability than accuracy-based measures, while requiring no supervision.

CRAug 16, 2019
The Next 700 Policy Miners: A Universal Method for Building Policy Miners

Carlos Cotrini, Luca Corinzia, Thilo Weghorn et al.

A myriad of access control policy languages have been and continue to be proposed. The design of policy miners for each such language is a challenging task that has required specialized machine learning and combinatorial algorithms. We present an alternative method, universal access control policy mining (Unicorn). We show how this method streamlines the design of policy miners for a wide variety of policy languages including ABAC, RBAC, RBAC with user-attribute constraints, RBAC with spatio-temporal constraints, and an expressive fragment of XACML. For the latter two, there were no known policy miners until now. To design a policy miner using Unicorn, one needs a policy language and a metric quantifying how well a policy fits an assignment of permissions to users. From these, one builds the policy miner as a search algorithm that computes a policy that best fits the given permission assignment. We experimentally evaluate the policy miners built with Unicorn on logs from Amazon and access control matrices from other companies. Despite the genericity of our method, our policy miners are competitive with and sometimes even better than specialized state-of-the-art policy miners. The true positive rates of policies we mined differ by only 5% from the policies mined by the state of the art and the false positive rates are always below 5%. In the case of ABAC, it even outperforms the state of the art.