ML LGJul 9, 2025

Distribution-free inference for LightGBM and GLM with Tweedie loss

Alokesh Manna, Aditya Vikram Sett, Dipak K. Dey, Yuwen Gu, Elizabeth D. Schifano, Jichao He

arXiv:2507.06921v17.82 citationsh-index: 47

Originality Incremental advance

AI Analysis

This work addresses uncertainty quantification for insurance pricing and risk management, representing an incremental improvement by adapting existing conformal prediction methods to specific models.

The authors tackled the problem of prediction uncertainty quantification for insurance claims using GLMs and gradient boosting models, proposing new non-conformity measures for conformal prediction and finding that locally weighted Pearson residuals for LightGBM achieved nominal coverage with the smallest average interval width in simulations.

Prediction uncertainty quantification is a key research topic in recent years scientific and business problems. In insurance industries (\cite{parodi2023pricing}), assessing the range of possible claim costs for individual drivers improves premium pricing accuracy. It also enables insurers to manage risk more effectively by accounting for uncertainty in accident likelihood and severity. In the presence of covariates, a variety of regression-type models are often used for modeling insurance claims, ranging from relatively simple generalized linear models (GLMs) to regularized GLMs to gradient boosting models (GBMs). Conformal predictive inference has arisen as a popular distribution-free approach for quantifying predictive uncertainty under relatively weak assumptions of exchangeability, and has been well studied under the classic linear regression setting. In this work, we propose new non-conformity measures for GLMs and GBMs with GLM-type loss. Using regularized Tweedie GLM regression and LightGBM with Tweedie loss, we demonstrate conformal prediction performance with these non-conformity measures in insurance claims data. Our simulation results favor the use of locally weighted Pearson residuals for LightGBM over other methods considered, as the resulting intervals maintained the nominal coverage with the smallest average width.

View on arXiv PDF

Similar