LG CLMay 9, 2025

Harnessing LLMs Explanations to Boost Surrogate Models in Tabular Data Classification

Ruxue Shi, Hengrui Gu, Xu Shen, Xin Wang

arXiv:2505.05744v14.1h-index: 5DASFAA

Originality Incremental advance

AI Analysis

This addresses performance and interpretability issues in tabular learning for real-world applications, representing an incremental advance by combining existing techniques in a new way.

The paper tackles the problem of high resource requirements, suboptimal demonstration selection, and limited interpretability in LLM-based methods for tabular data classification by proposing a novel in-context learning framework that leverages LLM explanations to guide a smaller surrogate model, resulting in an average accuracy improvement of 5.31% across various datasets.

Large Language Models (LLMs) have shown remarkable ability in solving complex tasks, making them a promising tool for enhancing tabular learning. However, existing LLM-based methods suffer from high resource requirements, suboptimal demonstration selection, and limited interpretability, which largely hinder their prediction performance and application in the real world. To overcome these problems, we propose a novel in-context learning framework for tabular prediction. The core idea is to leverage the explanations generated by LLMs to guide a smaller, locally deployable Surrogate Language Model (SLM) to make interpretable tabular predictions. Specifically, our framework mainly involves three stages: (i) Post Hoc Explanation Generation, where LLMs are utilized to generate explanations for question-answer pairs in candidate demonstrations, providing insights into the reasoning behind the answer. (ii) Post Hoc Explanation-Guided Demonstrations Selection, which utilizes explanations generated by LLMs to guide the process of demonstration selection from candidate demonstrations. (iii) Post Hoc Explanation-Guided Interpretable SLM Prediction, which utilizes the demonstrations obtained in step (ii) as in-context and merges corresponding explanations as rationales to improve the performance of SLM and guide the model to generate interpretable outputs. Experimental results highlight the framework's effectiveness, with an average accuracy improvement of 5.31% across various tabular datasets in diverse domains.

View on arXiv PDF

Similar