Sufficient dimension reduction for classification using principal optimal transport direction
This work addresses a specific bottleneck in supervised dimension reduction for classification tasks, offering an incremental improvement over existing methods.
The authors tackled the problem of sufficient dimension reduction for categorical responses, particularly binary classification, by proposing a novel method called principal optimal transport direction (POTD) that estimates the subspace using optimal transport couplings. Empirical results show that POTD outperforms most state-of-the-art linear dimension reduction methods.
Sufficient dimension reduction is used pervasively as a supervised dimension reduction approach. Most existing sufficient dimension reduction methods are developed for data with a continuous response and may have an unsatisfactory performance for the categorical response, especially for the binary-response. To address this issue, we propose a novel estimation method of sufficient dimension reduction subspace (SDR subspace) using optimal transport. The proposed method, named principal optimal transport direction (POTD), estimates the basis of the SDR subspace using the principal directions of the optimal transport coupling between the data respecting different response categories. The proposed method also reveals the relationship among three seemingly irrelevant topics, i.e., sufficient dimension reduction, support vector machine, and optimal transport. We study the asymptotic properties of POTD and show that in the cases when the class labels contain no error, POTD estimates the SDR subspace exclusively. Empirical studies show POTD outperforms most of the state-of-the-art linear dimension reduction methods.