LLM Embeddings for Deep Learning on Tabular Data
This addresses the limitation of cross-table transfer and pre-trained knowledge exploitation in tabular deep learning, though it is incremental as it builds on existing LLM and tabular methods.
The paper tackled the problem of heterogeneous feature encoding in tabular deep learning by transforming tabular data into text and using LLM embeddings, resulting in improved accuracy over models like MLP, ResNet, and FT-Transformer on seven classification datasets.
Tabular deep-learning methods require embedding numerical and categorical input features into high-dimensional spaces before processing them. Existing methods deal with this heterogeneous nature of tabular data by employing separate type-specific encoding approaches. This limits the cross-table transfer potential and the exploitation of pre-trained knowledge. We propose a novel approach that first transforms tabular data into text, and then leverages pre-trained representations from LLMs to encode this data, resulting in a plug-and-play solution to improv ing deep-learning tabular methods. We demonstrate that our approach improves accuracy over competitive models, such as MLP, ResNet and FT-Transformer, by validating on seven classification datasets.