InfoOT: Information Maximizing Optimal Transport
This addresses the problem of robust and generalizable sample alignment across distributions for researchers in machine learning and computational biology, though it appears incremental as an extension of existing optimal transport methods.
The paper tackles the limitations of optimal transport in ignoring data coherence, handling outliers, and integrating new data by proposing InfoOT, an information-theoretic extension that maximizes mutual information while minimizing geometric distances, leading to improved alignment quality across benchmarks in domain adaptation, cross-domain retrieval, and single-cell alignment.
Optimal transport aligns samples across distributions by minimizing the transportation cost between them, e.g., the geometric distances. Yet, it ignores coherence structure in the data such as clusters, does not handle outliers well, and cannot integrate new data points. To address these drawbacks, we propose InfoOT, an information-theoretic extension of optimal transport that maximizes the mutual information between domains while minimizing geometric distances. The resulting objective can still be formulated as a (generalized) optimal transport problem, and can be efficiently solved by projected gradient descent. This formulation yields a new projection method that is robust to outliers and generalizes to unseen samples. Empirically, InfoOT improves the quality of alignments across benchmarks in domain adaptation, cross-domain retrieval, and single-cell alignment.