RMLGAPJun 24, 2025

A comparative analysis of machine learning algorithms for predicting probabilities of default

arXiv:2506.19789v1
Originality Synthesis-oriented
AI Analysis

It addresses credit risk prediction for financial institutions, but is incremental as it applies existing methods to a known problem.

The paper compared five machine learning algorithms to logistic regression for predicting loan default probabilities, finding that they offer strengths and weaknesses in this context.

Predicting the probability of default (PD) of prospective loans is a critical objective for financial institutions. In recent years, machine learning (ML) algorithms have achieved remarkable success across a wide variety of prediction tasks; yet, they remain relatively underutilised in credit risk analysis. This paper highlights the opportunities that ML algorithms offer to this field by comparing the performance of five predictive models-Random Forests, Decision Trees, XGBoost, Gradient Boosting and AdaBoost-to the predominantly used logistic regression, over a benchmark dataset from Scheule et al. (Credit Risk Analytics: The R Companion). Our findings underscore the strengths and weaknesses of each method, providing valuable insights into the most effective ML algorithms for PD prediction in the context of loan portfolios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes