ML LGJan 24, 2021

Entropy Partial Transport with Tree Metrics: Theory and Practice

arXiv:2101.09756v114.417 citations

Originality Incremental advance

AI Analysis

This work addresses a fundamental problem in machine learning for comparing probability measures with different masses, offering a novel method that is incremental but provides practical improvements in computational efficiency and applicability.

The authors tackled the limitations of optimal transport for comparing probability measures with different masses by introducing entropy partial transport on trees, which yields a closed-form solution and enables fast computation and negative definiteness. They demonstrated its effectiveness on document classification and topological data analysis tasks, achieving competitive results against other baselines.

Optimal transport (OT) theory provides powerful tools to compare probability measures. However, OT is limited to nonnegative measures having the same mass, and suffers serious drawbacks about its computation and statistics. This leads to several proposals of regularized variants of OT in the recent literature. In this work, we consider an \textit{entropy partial transport} (EPT) problem for nonnegative measures on a tree having different masses. The EPT is shown to be equivalent to a standard complete OT problem on a one-node extended tree. We derive its dual formulation, then leverage this to propose a novel regularization for EPT which admits fast computation and negative definiteness. To our knowledge, the proposed regularized EPT is the first approach that yields a \textit{closed-form} solution among available variants of unbalanced OT. For practical applications without priori knowledge about the tree structure for measures, we propose tree-sliced variants of the regularized EPT, computed by averaging the regularized EPT between these measures using random tree metrics, built adaptively from support data points. Exploiting the negative definiteness of our regularized EPT, we introduce a positive definite kernel, and evaluate it against other baselines on benchmark tasks such as document classification with word embedding and topological data analysis. In addition, we empirically demonstrate that our regularization also provides effective approximations.

View on arXiv PDF

Similar