Dirk Tasche

ML
h-index2
14papers
139citations
Novelty30%
AI Score34

14 Papers

MLJun 6, 2022
Class Prior Estimation under Covariate Shift: No Problem?

Dirk Tasche

We show that in the context of classification the property of source and target distributions to be related by covariate shift may be lost if the information content captured in the covariates is reduced, for instance by dropping components or mapping into a lower-dimensional or finite space. As a consequence, under covariate shift simple approaches to class prior estimation in the style of classify and count with or without adjustment are infeasible. We prove that transformations of the covariates that preserve the covariate shift property are necessarily sufficient in the statistical sense for the full set of covariates. A probing algorithm as alternative approach to class prior estimation under covariate shift is proposed.

MLMar 29, 2023
Sparse joint shift in multinomial classification

Dirk Tasche

Sparse joint shift (SJS) was recently proposed as a tractable model for general dataset shift which may cause changes to the marginal distributions of features and labels as well as the posterior probabilities and the class-conditional feature distributions. Fitting SJS for a target dataset without label observations may produce valid predictions of labels and estimates of class prior probabilities. We present new results on the transmission of SJS from sets of features to larger sets of features, a conditional correction formula for the class posterior probabilities under the target distribution, identifiability of SJS, and the relationship between SJS and covariate shift. In addition, we point out inconsistencies in the algorithms which were proposed for estimating the characteristics of SJS, as they could hamper the search for optimal solutions, and suggest potential improvements.

MLJul 29, 2022
Factorizable Joint Shift in Multinomial Classification

Dirk Tasche

Factorizable joint shift (FJS) was recently proposed as a type of dataset shift for which the complete characteristics can be estimated from feature data observations on the test dataset by a method called Joint Importance Aligning. For the multinomial (multiclass) classification setting, we derive a representation of factorizable joint shift in terms of the source (training) distribution, the target (test) prior class probabilities and the target marginal distribution of the features. On the basis of this result, we propose alternatives to joint importance aligning and, at the same time, point out that factorizable joint shift is not fully identifiable if no class label information on the test dataset is available and no additional assumptions are made. Other results of the paper include correction formulae for the posterior class probabilities both under general dataset shift and factorizable joint shift. In addition, we investigate the consequences of assuming factorizable joint shift for the bias caused by sample selection.

LGNov 28, 2023
Invariance assumptions for class distribution estimation

Dirk Tasche

We study the problem of class distribution estimation under dataset shift. On the training dataset, both features and class labels are observed while on the test dataset only the features can be observed. The task then is the estimation of the distribution of the class labels, i.e. the estimation of the class prior probabilities, in the test dataset. Assumptions of invariance between the training joint distribution of features and labels and the test distribution can considerably facilitate this task. We discuss the assumptions of covariate shift, factorizable joint shift, and sparse joint shift and their implications for class distribution estimation.

LGJan 21
Factorizable joint shift revisited

Dirk Tasche

Factorizable joint shift (FJS) was proposed as a type of distribution shift (or dataset shift) that comprises both covariate and label shift. Recently, it has been observed that FJS actually arises from consecutive label and covariate (or vice versa) shifts. Research into FJS so far has been confined to the case of categorical label spaces. We propose a framework for analysing distribution shift in the case of general label spaces, thus covering both classification and regression models. Based on the framework, we generalise existing results on FJS to general label spaces and propose a related extension of the expectation maximisation (EM) algorithm for class prior probabilities. We also take a fresh look at generalized label shift (GLS) in the case of general label spaces.

LGMay 25, 2025
Recalibrating binary probabilistic classifiers

Dirk Tasche

Recalibration of binary probabilistic classifiers to a target prior probability is an important task in areas like credit risk management. We analyse methods for recalibration from a distribution shift perspective. Distribution shift assumptions linked to the area under the curve (AUC) of a probabilistic classifier are found to be useful for the design of meaningful recalibration methods. Two new methods called parametric covariate shift with posterior drift (CSPD) and ROC-based quasi moment matching (QMM) are proposed and tested together with some other methods in an example setting. The outcomes of the test suggest that the QMM methods discussed in the paper can provide appropriately conservative results in evaluations with concave functionals like for instance risk weights functions for credit risk.

MLJul 17, 2021
Minimising quantifier variance under prior probability shift

Dirk Tasche

For the binary prevalence quantification problem under prior probability shift, we determine the asymptotic variance of the maximum likelihood estimator. We find that it is a function of the Brier score for the regression of the class label on the features under the test data set distribution. This observation suggests that optimising the accuracy of a base classifier, as measured by the Brier score, on the training data set helps to reduce the variance of the related quantifier on the test data set. Therefore, we also point out training criteria for the base classifier that imply optimisation of both of the Brier scores on the training and the test data sets.

MLMay 15, 2021
Calibrating sufficiently

Dirk Tasche

When probabilistic classifiers are trained and calibrated, the so-called grouping loss component of the calibration loss can easily be overlooked. Grouping loss refers to the gap between observable information and information actually exploited in the calibration exercise. We investigate the relation between grouping loss and the concept of sufficiency, identifying comonotonicity as a useful criterion for sufficiency. We revisit the probing reduction approach of Langford & Zadrozny (2005) and find that it produces an estimator of probabilistic classifiers that reduces grouping loss. Finally, we discuss Brier curves as tools to support training and 'sufficient' calibration of probabilistic classifiers.

MLJun 10, 2019
Confidence intervals for class prevalences under prior probability shift

Dirk Tasche

Point estimation of class prevalences in the presence of data set shift has been a popular research topic for more than two decades. Less attention has been paid to the construction of confidence and prediction intervals for estimates of class prevalences. One little considered question is whether or not it is necessary for practical purposes to distinguish confidence and prediction intervals. Another question so far not yet conclusively answered is whether or not the discriminatory power of the classifier or score at the basis of an estimation method matters for the accuracy of the estimates of the class prevalences. This paper presents a simulation study aimed at shedding some light on these and other related questions.

MLApr 9, 2018
A plug-in approach to maximising precision at the top and recall at the top

Dirk Tasche

For information retrieval and binary classification, we show that precision at the top (or precision at k) and recall at the top (or recall at k) are maximised by thresholding the posterior probability of the positive class. This finding is a consequence of a result on constrained minimisation of the cost-sensitive expected classification error which generalises an earlier related result from the literature.

MLJan 19, 2017
Fisher consistency for prior probability shift

Dirk Tasche

We introduce Fisher consistency in the sense of unbiasedness as a desirable property for estimators of class prior probabilities. Lack of Fisher consistency could be used as a criterion to dismiss estimators that are unlikely to deliver precise estimates in test datasets under prior probability and more general dataset shift. The usefulness of this unbiasedness concept is demonstrated with three examples of classifiers used for quantification: Adjusted Classify & Count, EM-algorithm and CDE-Iterate. We find that Adjusted Classify & Count and EM-algorithm are Fisher consistent. A counter-example shows that CDE-Iterate is not Fisher consistent and, therefore, cannot be trusted to deliver reliable estimates of class probabilities.

MLFeb 28, 2016
Does quantification without adjustments work?

Dirk Tasche

Classification is the task of predicting the class labels of objects based on the observation of their features. In contrast, quantification has been defined as the task of determining the prevalences of the different sorts of class labels in a target dataset. The simplest approach to quantification is Classify & Count where a classifier is optimised for classification on a training set and applied to the target dataset for the prediction of class labels. In the case of binary quantification, the number of predicted positive labels is then used as an estimate of the prevalence of the positive class in the target dataset. Since the performance of Classify & Count for quantification is known to be inferior its results typically are subject to adjustments. However, some researchers recently have suggested that Classify & Count might actually work without adjustments if it is based on a classifer that was specifically trained for quantification. We discuss the theoretical foundation for this claim and explore its potential and limitations with a numerical example based on the binormal model with equal variances. In order to identify an optimal quantifier in the binormal setting, we introduce the concept of local Bayes optimality. As a side remark, we present a complete proof of a theorem by Ye et al. (2012).

MLJun 23, 2014
Exact fit of simple finite mixture models

Dirk Tasche

How to forecast next year's portfolio-wide credit default rate based on last year's default observations and the current score distribution? A classical approach to this problem consists of fitting a mixture of the conditional score distributions observed last year to the current score distribution. This is a special (simple) case of a finite mixture model where the mixture components are fixed and only the weights of the components are estimated. The optimum weights provide a forecast of next year's portfolio-wide default rate. We point out that the maximum-likelihood (ML) approach to fitting the mixture distribution not only gives an optimum but even an exact fit if we allow the mixture components to vary but keep their density ratio fix. From this observation we can conclude that the standard default rate forecast based on last year's conditional default rates will always be located between last year's portfolio-wide default rate and the ML forecast for next year. As an application example, then cost quantification is discussed. We also discuss how the mixture model based estimation methods can be used to forecast total loss. This involves the reinterpretation of an individual classification problem as a collective quantification problem.

PRDec 2, 2013
The Law of Total Odds

Dirk Tasche

The law of total probability may be deployed in binary classification exercises to estimate the unconditional class probabilities if the class proportions in the training set are not representative of the population class proportions. We argue that this is not a conceptually sound approach and suggest an alternative based on the new law of total odds. We quantify the bias of the total probability estimator of the unconditional class probabilities and show that the total odds estimator is unbiased. The sample version of the total odds estimator is shown to coincide with a maximum-likelihood estimator known from the literature. The law of total odds can also be used for transforming the conditional class probabilities if independent estimates of the unconditional class probabilities of the population are available. Keywords: Total probability, likelihood ratio, Bayes' formula, binary classification, relative odds, unbiased estimator, supervised learning, dataset shift.