LG MLJun 23, 2024

Zero-Inflated Tweedie Boosted Trees with CatBoost for Insurance Loss Analytics

arXiv:2406.16206v25 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of inaccurate zero-claim modeling for insurance companies, offering an incremental improvement over existing methods.

The paper tackles the limitation of traditional Tweedie regression models in accurately representing zero claims in insurance loss analytics by proposing a zero-inflated Tweedie model integrated with CatBoost boosting methods, demonstrating improved predictive accuracy on an insurance telematics dataset.

In this paper, we explore advanced modifications to the Tweedie regression model in order to address its limitations in modeling aggregate claims for various types of insurance such as automobile, health, and liability. Traditional Tweedie models, while effective in capturing the probability and magnitude of claims, usually fall short in accurately representing the large incidence of zero claims. Our recommended approach involves a refined modeling of the zero-claim process, together with the integration of boosting methods in order to help leverage an iterative process to enhance predictive accuracy. Despite the inherent slowdown in learning algorithms due to this iteration, several efficient implementation techniques that also help precise tuning of parameters like XGBoost, LightGBM, and CatBoost have emerged. Nonetheless, we chose to utilize CatBoost, an efficient boosting approach that effectively handles categorical and other special types of data. The core contribution of our paper is the assembly of separate modeling for zero claims and the application of tree-based boosting ensemble methods within a CatBoost framework, assuming that the inflated probability of zero is a function of the mean parameter. The efficacy of our enhanced Tweedie model is demonstrated through the application of an insurance telematics dataset, which presents the additional complexity of compositional feature variables. Our modeling results reveal a marked improvement in model performance, showcasing its potential to deliver more accurate predictions suitable for insurance claim analytics.

View on arXiv PDF

Similar