LG AIAug 26, 2022

Comparing Apples to Oranges: Learning Similarity Functions for Data Produced by Different Distributions

Leonidas Tsepenekas, Ivan Brugere, Freddy Lecue, Daniele Magazzeni

arXiv:2208.12731v23.32 citationsh-index: 9

Originality Incremental advance

AI Analysis

This addresses the challenge of ensuring fairness and accuracy in applications like clustering and individual fairness when similarity knowledge is hard to obtain across groups, representing an incremental improvement in method development.

The paper tackles the problem of learning similarity functions for data from different distributions, such as demographic groups, by developing an efficient sampling framework that uses limited expert feedback, and demonstrates theoretical bounds and empirical validation through experiments.

Similarity functions measure how comparable pairs of elements are, and play a key role in a wide variety of applications, e.g., notions of Individual Fairness abiding by the seminal paradigm of Dwork et al., as well as Clustering problems. However, access to an accurate similarity function should not always be considered guaranteed, and this point was even raised by Dwork et al. For instance, it is reasonable to assume that when the elements to be compared are produced by different distributions, or in other words belong to different ``demographic'' groups, knowledge of their true similarity might be very difficult to obtain. In this work, we present an efficient sampling framework that learns these across-groups similarity functions, using only a limited amount of experts' feedback. We show analytical results with rigorous theoretical bounds, and empirically validate our algorithms via a large suite of experiments.

View on arXiv PDF

Similar