Towards Interpretable Deep Neural Networks for Tabular Data
This addresses the need for interpretable AI in critical domains like finance and healthcare, offering a novel method that combines high performance with transparency.
The paper tackles the problem of interpretability in deep neural networks for tabular data by introducing XNNTab, which uses a sparse autoencoder to learn monosemantic features and assign human-interpretable semantics, achieving performance on par with or exceeding state-of-the-art black-box models.
Tabular data is the foundation of many applications in fields such as finance and healthcare. Although DNNs tailored for tabular data achieve competitive predictive performance, they are blackboxes with little interpretability. We introduce XNNTab, a neural architecture that uses a sparse autoencoder (SAE) to learn a dictionary of monosemantic features within the latent space used for prediction. Using an automated method, we assign human-interpretable semantics to these features. This allows us to represent predictions as linear combinations of semantically meaningful components. Empirical evaluations demonstrate that XNNTab attains performance on par with or exceeding that of state-of-the-art, black-box neural models and classical machine learning approaches while being fully interpretable.