MLLGOCMEMay 18, 2020

Optimal Representative Sample Weighting

arXiv:2005.09065v17 citationsHas Code
AI Analysis

This addresses the need for representative weighting in data analysis, particularly for skewed datasets, but is incremental as it builds on existing optimization methods.

The paper tackles the problem of assigning weights to samples to achieve representative averages matching prescribed values, framing it as a convex optimization problem that can be efficiently solved, with an open-source implementation applied to a skewed CDC BRFSS dataset.

We consider the problem of assigning weights to a set of samples or data records, with the goal of achieving a representative weighting, which happens when certain sample averages of the data are close to prescribed values. We frame the problem of finding representative sample weights as an optimization problem, which in many cases is convex and can be efficiently solved. Our formulation includes as a special case the selection of a fixed number of the samples, with equal weights, i.e., the problem of selecting a smaller representative subset of the samples. While this problem is combinatorial and not convex, heuristic methods based on convex optimization seem to perform very well. We describe rsw, an open-source implementation of the ideas described in this paper, and apply it to a skewed sample of the CDC BRFSS dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes