Learning Representations without Compositional Assumptions
This work addresses representation learning for real-world tabular datasets with complex dependencies, offering a novel approach to capture localized information, though it appears incremental as it builds on multi-view frameworks.
The paper tackles unsupervised representation learning on multi-view tabular data by addressing the limitation of traditional methods that assume globally shared factors, proposing LEGATO, a hierarchical graph autoencoder that learns feature set dependencies through a latent graph, resulting in superior downstream performance.
This paper addresses unsupervised representation learning on tabular data containing multiple views generated by distinct sources of measurement. Traditional methods, which tackle this problem using the multi-view framework, are constrained by predefined assumptions that assume feature sets share the same information and representations should learn globally shared factors. However, this assumption is not always valid for real-world tabular datasets with complex dependencies between feature sets, resulting in localized information that is harder to learn. To overcome this limitation, we propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges. Furthermore, we introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically. This approach results in latent graph components that specialize in capturing localized information from different regions of the input, leading to superior downstream performance.