LGSep 25, 2025

Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

arXiv:2509.21465v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses the need for interpretable and efficient models in low-resource tabular data settings, offering an incremental improvement over existing methods by integrating LLM reasoning with decision tree induction.

The paper tackles the problem of interpretability and inference cost in tabular data models by using reasoning-capable LLMs to induce decision trees for low-resource datasets, resulting in lightweight trees that outperform traditional CART methods while providing human-readable reasoning traces.

Tabular foundation models are becoming increasingly popular for low-resource tabular problems. These models make up for small training datasets by pretraining on large volumes of synthetic data. The prior knowledge obtained via pretraining provides the exceptional performance, but the resulting model becomes a black box that is difficult to interpret and costly to inference. In this work, we explore an alternative strategy: using reasoning-capable LLMs to induce decision trees for small tabular datasets in agentic setup. We design a minimal set of tools for constructing, analyzing and manipulating decision trees. By using these tools, LLMs combine their prior knowledge with learning from data to create a lightweight decision tree that outperforms traditional CART on low-resource tabular problems. While a single decision tree does not outperform state-of-the-art black box models, it comes with a human-readable reasoning trace that can be checked for biases and data leaks. Furthermore, the reasoning-based LLM's creation process allows for additional human input: correcting biases or incorporating domain-specific intuition that is not captured in the data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes