DOFEN: Deep Oblivious Forest ENsemble
This addresses the challenge of improving DNN performance on tabular data, a common format in various domains, though it is incremental as it builds on existing tree-based methods.
The paper tackles the problem of deep neural networks (DNNs) underperforming compared to gradient boosting decision trees (GBDT) on tabular data by proposing DOFEN, a novel DNN architecture inspired by oblivious decision trees, which achieves state-of-the-art results among DNNs and narrows the performance gap on the Tabular Benchmark with 73 datasets.
Deep Neural Networks (DNNs) have revolutionized artificial intelligence, achieving impressive results on diverse data types, including images, videos, and texts. However, DNNs still lag behind Gradient Boosting Decision Trees (GBDT) on tabular data, a format extensively utilized across various domains. In this paper, we propose DOFEN, short for \textbf{D}eep \textbf{O}blivious \textbf{F}orest \textbf{EN}semble, a novel DNN architecture inspired by oblivious decision trees. DOFEN constructs relaxed oblivious decision trees (rODTs) by randomly combining conditions for each column and further enhances performance with a two-level rODT forest ensembling process. By employing this approach, DOFEN achieves state-of-the-art results among DNNs and further narrows the gap between DNNs and tree-based models on the well-recognized benchmark: Tabular Benchmark \citep{grinsztajn2022tree}, which includes 73 total datasets spanning a wide array of domains. The code of DOFEN is available at: \url{https://github.com/Sinopac-Digital-Technology-Division/DOFEN}.