Patrick Mesana

LG
h-index3
3papers
2citations
Novelty42%
AI Score27

3 Papers

LGJun 4, 2025
On the Usage of Gaussian Process for Efficient Data Valuation

Clément Bénesse, Patrick Mesana, Athénaïs Gautier et al.

In machine learning, knowing the impact of a given datum on model training is a fundamental task referred to as Data Valuation. Building on previous works from the literature, we have designed a novel canonical decomposition allowing practitioners to analyze any data valuation method as the combination of two parts: a utility function that captures characteristics from a given model and an aggregation procedure that merges such information. We also propose to use Gaussian Processes as a means to easily access the utility function on ``sub-models'', which are models trained on a subset of the training set. The strength of our approach stems from both its theoretical grounding in Bayesian theory, and its practical reach, by enabling fast estimation of valuations thanks to efficient update formulae.

LGNov 2, 2024
WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles

Patrick Mesana, Clément Bénesse, Hadrien Lautraite et al.

In this paper, we introduce WaKA (Wasserstein K-nearest-neighbors Attribution), a novel attribution method that leverages principles from the LiRA (Likelihood Ratio Attack) framework and k-nearest neighbors classifiers (k-NN). WaKA efficiently measures the contribution of individual data points to the model's loss distribution, analyzing every possible k-NN that can be constructed using the training set, without requiring to sample subsets of the training set. WaKA is versatile and can be used a posteriori as a membership inference attack (MIA) to assess privacy risks or a priori for privacy influence measurement and data valuation. Thus, WaKA can be seen as bridging the gap between data attribution and membership inference attack (MIA) by providing a unified framework to distinguish between a data point's value and its privacy risk. For instance, we have shown that self-attribution values are more strongly correlated with the attack success rate than the contribution of a point to the model generalization. WaKA's different usage were also evaluated across diverse real-world datasets, demonstrating performance very close to LiRA when used as an MIA on k-NN classifiers, but with greater computational efficiency. Additionally, WaKA shows greater robustness than Shapley Values for data minimization tasks (removal or addition) on imbalanced datasets.

LGOct 28, 2021
Generating synthetic transactional profiles

Hadrien Lautraite, Patrick Mesana

Financial institutions use clients' payment transactions in numerous banking applications. Transactions are very personal and rich in behavioural patterns, often unique to individuals, which make them equivalent to personally identifiable information in some cases. In this paper, we generate synthetic transactional profiles using machine learning techniques with the goal to preserve both data utility and privacy. A challenge we faced was to deal with sparse vectors due to the few spending categories a client uses compared to all the ones available. We measured data utility by calculating common insights used by the banking industry on both the original and the synthetic data-set. Our approach shows that neural network models can generate valuable synthetic data in such context. Finally, we tried privacy-preserving techniques and observed its effect on models' performances.