ML LGOct 19, 2023

Constrained Reweighting of Distributions: an Optimal Transport Approach

Abhisek Chakraborty, Anirban Bhattacharya, Debdeep Pati

arXiv:2310.12447v22.35 citationsh-index: 24

Originality Incremental advance

AI Analysis

This work provides a flexible tool for statisticians and machine learning practitioners to handle constrained reweighting in applications like finance and fairness, though it builds incrementally on existing optimal transport and entropy methods.

The authors tackled the problem of adjusting empirical data distributions to meet predefined constraints on weights, such as moments or tail behavior, by introducing a nonparametric framework that combines maximum entropy and optimal transport. They demonstrated the method's versatility in portfolio allocation, survey inference, and algorithmic fairness, showing it can enforce distributional constraints while allowing controlled deviations.

We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the flexibility of such methodology by introducing a nonparametrically imbued distributional constraints on the weights, and developing a general framework leveraging the maximum entropy principle and tools from optimal transport. The key idea is to ensure that the maximum entropy weight adjusted empirical distribution of the observed data is close to a pre-specified probability distribution in terms of the optimal transport metric while allowing for subtle departures. The versatility of the framework is demonstrated in the context of three disparate applications where data re-weighting is warranted to satisfy side constraints on the optimization problem at the heart of the statistical task: namely, portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.

View on arXiv PDF

Similar