SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data
For practitioners in medicine, finance, and science who need to learn from limited labeled tabular data, SeBA offers a new approach that avoids the challenge of defining meaningful augmentations.
SeBA proposes a joint-embedding framework for semi-supervised few-shot learning on tabular data that eliminates the need for data augmentations by separating data into two views and aligning representations via nearest-neighbor correspondence. It achieves state-of-the-art performance on most benchmark datasets.
Learning from scarce labeled data with a larger pool of unlabeled samples, known as semi-supervised few-shot learning (SS-FSL), remains critical for applications involving tabular data in domains like medicine, finance, and science. The existing SS-FSL methods often rely on self-supervised learning (SSL) frameworks developed for vision or language, which assume the availability of a natural form of data augmentations. For tabular data, defining meaningful augmentations is non-trivial and can easily distort semantics, limiting the effectiveness of conventional SSL. In this work, we rethink SSL for tabular data and propose Separated-at-Birth Alignment (SeBA), a joint-embedding framework for SS-FSL that eliminates the dependence on augmentations. Our core idea is to separate the data into two independent, but complementary views and align the representations of one view to mirror the nearest-neighbor correspondence of the data in the second view. Our experimental evaluation supported by a theoretical analysis justifies that SeBA generates an output space, which improves the feature-label relationship. An experimental study conducted in various benchmark datasets demonstrates that SeBA achieves the state-of-the-art performance in the majority of cases, opening a new avenue for SS-FSL paradigm in the domain of tabular data.