LGCVAug 30, 2025

Integrated Multivariate Segmentation Tree for the Analysis of Heterogeneous Credit Data in Small and Medium-Sized Enterprises

arXiv:2509.00550v11 citationsExpert syst appl
Originality Incremental advance
AI Analysis

This addresses credit risk assessment for SMEs, offering an incremental improvement over existing methods.

The paper tackled the problem of credit evaluation for small and medium-sized enterprises by integrating financial and textual data, achieving an accuracy of 88.9% which outperformed baseline models like decision trees and logistic regression.

Traditional decision tree models, which rely exclusively on numerical variables, often encounter difficulties in handling high-dimensional data and fail to effectively incorporate textual information. To address these limitations, we propose the Integrated Multivariate Segmentation Tree (IMST), a comprehensive framework designed to enhance credit evaluation for small and medium-sized enterprises (SMEs) by integrating financial data with textual sources. The methodology comprises three core stages: (1) transforming textual data into numerical matrices through matrix factorization; (2) selecting salient financial features using Lasso regression; and (3) constructing a multivariate segmentation tree based on the Gini index or Entropy, with weakest-link pruning applied to regulate model complexity. Experimental results derived from a dataset of 1,428 Chinese SMEs demonstrate that IMST achieves an accuracy of 88.9%, surpassing baseline decision trees (87.4%) as well as conventional models such as logistic regression and support vector machines (SVM). Furthermore, the proposed model exhibits superior interpretability and computational efficiency, featuring a more streamlined architecture and enhanced risk detection capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes