Between-Sample Relationship in Learning Tabular Data Using Graph and Attention Networks
This addresses the limitation of i.i.d. assumptions in tabular data for machine learning practitioners, offering a novel approach that improves classification accuracy, though it is incremental as it builds on existing GNN and attention techniques.
The paper tackles the problem of learning tabular data by relaxing the i.i.d. assumption to incorporate between-sample relationships using graph neural networks (GNNs) and attention models, resulting in GNN methods achieving the best performance on data with large feature-to-sample ratios and outperforming traditional methods on five datasets and SOTA deep tabular methods on three datasets.
Traditional machine learning assumes samples in tabular data to be independent and identically distributed (i.i.d). This assumption may miss useful information within and between sample relationships in representation learning. This paper relaxes the i.i.d assumption to learn tabular data representations by incorporating between-sample relationships for the first time using graph neural networks (GNN). We investigate our hypothesis using several GNNs and state-of-the-art (SOTA) deep attention models to learn the between-sample relationship on ten tabular data sets by comparing them to traditional machine learning methods. GNN methods show the best performance on tabular data with large feature-to-sample ratios. Our results reveal that attention-based GNN methods outperform traditional machine learning on five data sets and SOTA deep tabular learning methods on three data sets. Between-sample learning via GNN and deep attention methods yield the best classification accuracy on seven of the ten data sets. This suggests that the i.i.d assumption may not always hold for most tabular data sets.