Jennifer Priestley

ML
3papers
40citations
Novelty17%
AI Score14

3 Papers

APDec 28, 2018
A Descriptive Study of Variable Discretization and Cost-Sensitive Logistic Regression on Imbalanced Credit Data

Lili Zhang, Herman Ray, Jennifer Priestley et al.

Training classification models on imbalanced data tends to result in bias towards the majority class. In this paper, we demonstrate how variable discretization and cost-sensitive logistic regression help mitigate this bias on an imbalanced credit scoring dataset, and further show the application of the variable discretization technique on the data from other domains, demonstrating its potential as a generic technique for classifying imbalanced data beyond credit socring. The performance measurements include ROC curves, Area under ROC Curve (AUC), Type I Error, Type II Error, accuracy, and F1 score. The results show that proper variable discretization and cost-sensitive logistic regression with the best class weights can reduce the model bias and/or variance. From the perspective of the algorithm, cost-sensitive logistic regression is beneficial for increasing the value of predictors even if they are not in their optimized forms while maintaining monotonicity. From the perspective of predictors, the variable discretization performs better than cost-sensitive logistic regression, provides more reasonable coefficient estimates for predictors which have nonlinear relationships against their empirical logit, and is robust to penalty weights on misclassifications of events and non-events determined by their apriori proportions.

MLJun 12, 2018
Logistic Ensemble Models

Bob Vanderheyden, Jennifer Priestley

Predictive models that are developed in a regulated industry or a regulated application, like determination of credit worthiness, must be interpretable and rational (e.g., meaningful improvements in basic credit behavior must result in improved credit worthiness scores). Machine Learning technologies provide very good performance with minimal analyst intervention, making them well suited to a high volume analytic environment, but the majority are black box tools that provide very limited insight or interpretability into key drivers of model performance or predicted model output values. This paper presents a methodology that blends one of the most popular predictive statistical modeling methods for binary classification with a core model enhancement strategy found in machine learning. The resulting prediction methodology provides solid performance, from minimal analyst effort, while providing the interpretability and rationality required in regulated industries, as well as in other environments where interpretation of model parameters is required (e.g. businesses that require interpretation of models, to take action on them).

MLMar 10, 2018
Influence of the Event Rate on Discrimination Abilities of Bankruptcy Prediction Models

Lili Zhang, Jennifer Priestley, Xuelei Ni

In bankruptcy prediction, the proportion of events is very low, which is often oversampled to eliminate this bias. In this paper, we study the influence of the event rate on discrimination abilities of bankruptcy prediction models. First the statistical association and significance of public records and firmographics indicators with the bankruptcy were explored. Then the event rate was oversampled from 0.12% to 10%, 20%, 30%, 40%, and 50%, respectively. Seven models were developed, including Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine, Bayesian Network, and Neural Network. Under different event rates, models were comprehensively evaluated and compared based on Kolmogorov-Smirnov Statistic, accuracy, F1 score, Type I error, Type II error, and ROC curve on the hold-out dataset with their best probability cut-offs. Results show that Bayesian Network is the most insensitive to the event rate, while Support Vector Machine is the most sensitive.