LGMLJan 7, 2025

Joint Hierarchical Representation Learning of Samples and Features via Informed Tree-Wasserstein Distance

arXiv:2501.03627v32 citationsh-index: 30
Originality Incremental advance
AI Analysis

This addresses the need for joint hierarchical modeling in data analysis, offering a novel approach for unsupervised learning in domains like bioinformatics and NLP, though it is incremental as it builds on existing Tree-Wasserstein Distance methods.

The paper tackles the problem of learning hierarchical representations for both samples and features in high-dimensional data by proposing an unsupervised method that alternates between constructing trees and computing Tree-Wasserstein Distances, resulting in improved performance in tasks like link prediction and node classification, with specific gains over baselines on datasets such as word-document and single-cell RNA-sequencing.

High-dimensional data often exhibit hierarchical structures in both modes: samples and features. Yet, most existing approaches for hierarchical representation learning consider only one mode at a time. In this work, we propose an unsupervised method for jointly learning hierarchical representations of samples and features via Tree-Wasserstein Distance (TWD). Our method alternates between the two data modes. It first constructs a tree for one mode, then computes a TWD for the other mode based on that tree, and finally uses the resulting TWD to build the second mode's tree. By repeatedly alternating through these steps, the method gradually refines both trees and the corresponding TWDs, capturing meaningful hierarchical representations of the data. We provide a theoretical analysis showing that our method converges. We show that our method can be integrated into hyperbolic graph convolutional networks as a pre-processing technique, improving performance in link prediction and node classification tasks. In addition, our method outperforms baselines in sparse approximation and unsupervised Wasserstein distance learning tasks on word-document and single-cell RNA-sequencing datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes