4.9STMar 11
Conformal e-prediction in the presence of confoundingVladimir Vovk, Ruodu Wang
This note extends conformal e-prediction to cover the case where there is observed confounding between the random object $X$ and its label $Y$. We consider both the case where the observed data is IID and a case where some dependence between observations is permitted.
13.2LGMay 8
Aggregation in conformal e-classificationVladimir Vovk
Aggregating conformal predictors is a standard way of balancing their predictive and computational efficiency while retaining their validity, at least approximately. An important advantage of conformal e-predictors is that they are easier to aggregate without sacrificing their validity. This paper studies experimentally cross-conformal e-prediction, which is an existing method of aggregating conformal e-predictors, and its modifications that are conceptually simpler and more flexible.
19.0LGMay 7
Inductive Venn-Abers and related regressorsIvan Petej, Vladimir Vovk
Venn-Abers predictors are probabilistic predictors that enjoy appealing properties of validity, but their major limitation is that they are applicable only to the case of binary classification, with a recent extension to bounded regression. We generalize them to the case of unbounded regression, which requires adding an element of conformal prediction. In our simulation and empirical studies we investigate the predictive efficiency of point regressors derived from Venn-Abers regressors and argue that they somewhat improve the predictive efficiency of standard regressors for larger training sets.
LGFeb 26, 2025
Universality of conformal prediction under the assumption of randomnessVladimir Vovk
Conformal predictors provide set or functional predictions that are valid under the assumption of randomness, i.e., under the assumption of independent and identically distributed data. The question asked in this paper is whether there are predictors that are valid in the same sense under the assumption of randomness and that are more efficient than conformal predictors. The answer is that the class of conformal predictors is universal in that only limited gains in predictive efficiency are possible. The previous work in this area has relied on the algorithmic theory of randomness and so involved unspecified constants, whereas this paper's results are much more practical. They are also shown to be optimal in some respects.
LGJan 20, 2025
Randomness, exchangeability, and conformal predictionVladimir Vovk
This paper argues for a wider use of the functional theory of randomness, a modification of the algorithmic theory of randomness getting rid of unspecified additive constants. Both theories are useful for understanding relationships between the assumptions of IID data and data exchangeability. While the assumption of IID data is standard in machine learning, conformal prediction relies on data exchangeability. Nouretdinov, V'yugin, and Gammerman showed, using the language of the algorithmic theory of randomness, that conformal prediction is a universal method under the assumption of IID data. In this paper (written for the Alex Gammerman Festschrift) I will selectively review connections between exchangeability and the property of being IID, early history of conformal prediction, my encounters and collaboration with Alex and other interesting people, and a translation of Nouretdinov et al.'s results into the language of the functional theory of randomness, which moves it closer to practice. Namely, the translation says that every confidence predictor that is valid for IID data can be transformed to a conformal predictor without losing much in predictive efficiency.
LGMar 4, 2025
Inductive randomness predictors: beyond conformalVladimir Vovk
This paper introduces inductive randomness predictors, which form a proper superset of inductive conformal predictors but have the same principal property of validity under the assumption of randomness (i.e., of IID data). It turns out that every non-trivial inductive conformal predictor is strictly dominated by an inductive randomness predictor, although the improvement is not great, at most a factor of $\mathrm{e}\approx2.72$ in the case of e-prediction. The dominating inductive randomness predictors are more complicated and more difficult to compute; besides, an improvement by a factor of $\mathrm{e}$ is rare. Therefore, this paper does not suggest replacing inductive conformal predictors by inductive randomness predictors and only calls for a more detailed study of the latter.
STDec 4, 2024
Validity and efficiency of the conformal CUSUM procedureVladimir Vovk, Ilia Nouretdinov, Alex Gammerman
In this paper we study the validity and efficiency of a conformal version of the CUSUM procedure for change detection both experimentally and theoretically.
AISep 3, 2023
Logic of subjective probabilityVladimir Vovk
In this paper I discuss both syntax and semantics of subjective probability. The semantics determines ways of testing probability statements. Among important varieties of subjective probabilities are intersubjective probabilities and impersonal probabilities, and I will argue that well-tested impersonal probabilities acquire features of objective probabilities. Jeffreys's law, my next topic, states that two successful probability forecasters must issue forecasts that are close to each other, thus supporting the idea of objective probabilities. Finally, I will discuss connections between subjective and frequentist probability.
STNov 2, 2021
Conformal testing: binary case with Markov alternativesVladimir Vovk, Ilia Nouretdinov, Alex Gammerman
We continue study of conformal testing in binary model situations. In this note we consider Markov alternatives to the null hypothesis of exchangeability. We propose two new classes of conformal test martingales; one class is statistically efficient in our experiments, and the other class partially sacrifices statistical efficiency to gain computational efficiency.
LGJul 4, 2021
Protected probabilistic classificationVladimir Vovk, Ivan Petej, Alex Gammerman
This paper proposes a way of protecting probabilistic prediction models against changes in the data distribution, concentrating on the case of classification and paying particular attention to binary classification. This is important in applications of machine learning, where the quality of a trained prediction algorithm may drop significantly in the process of its exploitation. Our techniques are based on recent work on conformal test martingales and older work on prediction with expert advice, namely tracking the best expert.
LGMay 18, 2021
Enhancement of prediction algorithms by bettingVladimir Vovk
This note proposes a procedure for enhancing the quality of probabilistic prediction algorithms via betting against their predictions. It is inspired by the success of the conformal test martingales that have been developed recently.
LGApr 5, 2021
Conformal testing in a binary model situationVladimir Vovk
Conformal testing is a way of testing the IID assumption based on conformal prediction. The topic of this note is computational evaluation of the performance of conformal testing in a model situation in which IID binary observations generated from a Bernoulli distribution are followed by IID binary observations generated from another Bernoulli distribution, with the parameters of the distributions and changepoint unknown. Existing conformal test martingales can be used for this task and work well in simple cases, but their efficiency can be improved greatly.
LGFeb 20, 2021
Retrain or not retrain: Conformal test martingales for change-point detectionVladimir Vovk, Ivan Petej, Ilia Nouretdinov et al.
We argue for supplementing the process of training a prediction algorithm by setting up a scheme for detecting the moment when the distribution of the data changes and the algorithm needs to be retrained. Our proposed schemes are based on exchangeability martingales, i.e., processes that are martingales under any exchangeable distribution for the data. Our method, based on conformal prediction, is general and can be applied on top of any modern prediction algorithm. Its validity is guaranteed, and in this paper we make first steps in exploring its efficiency.
LGDec 28, 2020
Testing for concept shift onlineVladimir Vovk
This note continues study of exchangeability martingales, i.e., processes that are martingales under any exchangeable distribution for the observations. Such processes can be used for detecting violations of the IID assumption, which is commonly made in machine learning. Violations of the IID assumption are sometimes referred to as dataset shift, and dataset shift is sometimes subdivided into concept shift, covariate shift, etc. Our primary interest is in concept shift, but we will also discuss exchangeability martingales that decompose perfectly into two components one of which detects concept shift and the other detects what we call label shift. Our methods will be based on techniques of conformal prediction.
LGMay 14, 2020
Training conformal predictorsNicolo Colombo, Vladimir Vovk
Efficiency criteria for conformal prediction, such as \emph{observed fuzziness} (i.e., the sum of p-values associated with false labels), are commonly used to \emph{evaluate} the performance of given conformal predictors. Here, we investigate whether it is possible to exploit efficiency criteria to \emph{learn} classifiers, both conformal predictors and point classifiers, by using such criteria as training objective functions. The proposed idea is implemented for the problem of binary classification of hand-written digits. By choosing a 1-dimensional model class (with one real-valued free parameter), we can solve the optimization problems through an (approximate) exhaustive search over (a discrete version of) the parameter space. Our empirical results suggest that conformal predictors trained by minimizing their observed fuzziness perform better than conformal predictors trained in the traditional way by minimizing the \emph{prediction error} of the corresponding point classifier. They also have a reasonable performance in terms of their prediction error on the test set.
LGJan 16, 2020
Conformal e-predictionVladimir Vovk
This paper discusses a counterpart of conformal prediction for e-values, conformal e-prediction. Conformal e-prediction is conceptually simpler and had been developed in the 1990s as a precursor of conformal prediction. When conformal prediction emerged as result of replacing e-values by p-values, it seemed to have important advantages over conformal e-prediction without obvious disadvantages. This paper re-examines relations between conformal prediction and conformal e-prediction systematically from a modern perspective. Conformal e-prediction has advantages of its own, such as the ease of designing conditional conformal e-predictors and the guaranteed validity of cross-conformal e-predictors (whereas for cross-conformal predictors validity is only an empirical fact and can be broken with excessive randomization). Even where conformal prediction has clear advantages, conformal e-prediction can often emulate those advantages, more or less successfully.
LGNov 3, 2019
Computationally efficient versions of conformal predictive distributionsVladimir Vovk, Ivan Petej, Ilia Nouretdinov et al.
Conformal predictive systems are a recent modification of conformal predictors that output, in regression problems, probability distributions for labels of test observations rather than set predictions. The extra information provided by conformal predictive systems may be useful, e.g., in decision making problems. Conformal predictive systems inherit the relative computational inefficiency of conformal predictors. In this paper we discuss two computationally efficient versions of conformal predictive systems, which we call split conformal predictive systems and cross-conformal predictive systems. The main advantage of split conformal predictive systems is their guaranteed validity, whereas for cross-conformal predictive systems validity only holds empirically and in the absence of excessive randomization. The main advantage of cross-conformal predictive systems is their greater predictive efficiency.
PRJun 21, 2019
Testing randomnessVladimir Vovk
The hypothesis of randomness is fundamental in statistical machine learning and in many areas of nonparametric statistics; it says that the observations are assumed to be independent and coming from the same unknown probability distribution. This hypothesis is close, in certain respects, to the hypothesis of exchangeability, which postulates that the distribution of the observations is invariant with respect to their permutations. This paper reviews known methods of testing the two hypotheses concentrating on the online mode of testing, when the observations arrive sequentially. All known online methods for testing these hypotheses are based on conformal martingales, which are defined and studied in detail. The paper emphasizes conceptual and practical aspects and states two kinds of results. Validity results limit the probability of a false alarm or the frequency of false alarms for various procedures based on conformal martingales, including conformal versions of the CUSUM and Shiryaev-Roberts procedures. Efficiency results establish connections between randomness, exchangeability, and conformal martingales.
LGFeb 18, 2019
Conformal calibratorsVladimir Vovk, Ivan Petej, Paolo Toccaceli et al.
Most existing examples of full conformal predictive systems, split-conformal predictive systems, and cross-conformal predictive systems impose severe restrictions on the adaptation of predictive distributions to the test object at hand. In this paper we develop split-conformal and cross-conformal predictive systems that are fully adaptive. Our method consists in calibrating existing predictive systems; the input predictive system is not supposed to satisfy any properties of validity, whereas the output predictive system is guaranteed to be calibrated in probability. It is interesting that the method may also work without the IID assumption, standard in conformal prediction.
LGOct 24, 2017
Conformal predictive distributions with kernelsVladimir Vovk, Ilia Nouretdinov, Valery Manokhin et al.
This paper reviews the checkered history of predictive distributions in statistics and discusses two developments, one from recent literature and the other new. The first development is bringing predictive distributions into machine learning, whose early development was so deeply influenced by two remarkable groups at the Institute of Automation and Remote Control. The second development is combining predictive distributions with kernel methods, which were originated by one of those groups, including Emmanuel Braverman.
LGAug 6, 2017
Universally consistent predictive distributionsVladimir Vovk
This paper describes simple universally consistent procedures of probability forecasting that satisfy a natural property of small-sample validity, under the assumption that the observations are produced independently in the IID fashion.
MLJun 11, 2017
Inductive Conformal Martingales for Change-Point DetectionDenis Volkhonskiy, Ilia Nouretdinov, Alexander Gammerman et al.
We consider the problem of quickest change-point detection in data streams. Classical change-point detection procedures, such as CUSUM, Shiryaev-Roberts and Posterior Probability statistics, are optimal only if the change-point model is known, which is an unrealistic assumption in typical applied problems. Instead we propose a new method for change-point detection based on Inductive Conformal Martingales, which requires only the independence and identical distribution of observations. We compare the proposed approach to standard methods, as well as to change-point detection oracles, which model a typical practical situation when we have only imprecise (albeit parametric) information about pre- and post-change data distributions. Results of comparison provide evidence that change-point detection based on Inductive Conformal Martingales is an efficient tool, capable to work under quite general conditions unlike traditional approaches.
LGMar 14, 2016
Criteria of efficiency for conformal predictionVladimir Vovk, Ilia Nouretdinov, Valentina Fedorova et al.
We study optimal conformity measures for various criteria of efficiency of classification in an idealised setting. This leads to an important class of criteria of efficiency that we call probabilistic; it turns out that the most standard criteria of efficiency used in literature on conformal prediction are not probabilistic unless the problem of classification is binary. We consider both unconditional and label-conditional conformal prediction.
LGMar 14, 2016
Universal probability-free predictionVladimir Vovk, Dusko Pavlovic
We construct universal prediction systems in the spirit of Popper's falsifiability and Kolmogorov complexity and randomness. These prediction systems do not depend on any statistical assumptions (but under the IID assumption they dominate, to within the usual accuracy, conformal prediction). Our constructions give rise to a theory of algorithmic complexity and randomness of time containing analogues of several notions and results of the classical theory of Kolmogorov complexity and randomness.
LGNov 1, 2015
Large-scale probabilistic predictors with and without guarantees of validityVladimir Vovk, Ivan Petej, Valentina Fedorova
This paper studies theoretically and empirically a method of turning machine-learning algorithms into probabilistic predictors that automatically enjoys a property of validity (perfect calibration) and is computationally efficient. The price to pay for perfect calibration is that these probabilistic predictors produce imprecise (in practice, almost precise for large data sets) probabilities. When these imprecise probabilities are merged into precise probabilities, the resulting predictors, while losing the theoretical property of perfect calibration, are consistently more accurate than the existing methods in empirical studies.
LGFeb 22, 2015
The fundamental nature of the log loss functionVladimir Vovk
The standard loss functions used in the literature on probabilistic prediction are the log loss function, the Brier loss function, and the spherical loss function; however, any computable proper loss function can be used for comparison of prediction algorithms. This note shows that the log loss function is most selective in that any prediction algorithm that is optimal for a given data sequence (in the sense of the algorithmic theory of randomness) under the log loss function will be optimal under any computable proper mixable loss function; on the other hand, there is a data sequence and a prediction algorithm that is optimal for that sequence under either of the two other standard loss functions but not under the log loss function.
LGAug 9, 2014
Prediction with Advice of Unknown Number of ExpertsAlexey Chernov, Vladimir Vovk
In the framework of prediction with expert advice, we consider a recently introduced kind of regret bounds: the bounds that depend on the effective instead of nominal number of experts. In contrast to the Normal- Hedge bound, which mainly depends on the effective number of experts but also weakly depends on the nominal one, we obtain a bound that does not contain the nominal number of experts at all. We use the defensive forecasting method and introduce an application of defensive forecasting to multivalued supermartingales.
LGJun 21, 2014
From conformal to probabilistic predictionVladimir Vovk, Ivan Petej, Valentina Fedorova
This paper proposes a new method of probabilistic prediction, which is based on conformal prediction. The method is applied to the standard USPS data set and gives encouraging results.
LGApr 8, 2014
Efficiency of conformalized ridge regressionEvgeny Burnaev, Vladimir Vovk
Conformal prediction is a method of producing prediction sets that can be applied on top of a wide range of prediction algorithms. The method has a guaranteed coverage probability under the standard IID assumption regardless of whether the assumptions (often considerably more restrictive) of the underlying algorithm are satisfied. However, for the method to be really useful it is desirable that in the case where the assumptions of the underlying algorithm are satisfied, the conformal predictor loses little in efficiency as compared with the underlying algorithm (whereas being a conformal predictor, it has the stronger guarantee of validity). In this paper we explore the degree to which this additional requirement of efficiency is satisfied in the case of Bayesian ridge regression; we find that asymptotically conformal prediction sets differ little from ridge regression prediction intervals when the standard Bayesian assumptions are satisfied.
LGJan 16, 2014
Regression Conformal Prediction with Nearest NeighboursHarris Papadopoulos, Vladimir Vovk, Alex Gammerman
In this paper we apply Conformal Prediction (CP) to the k-Nearest Neighbours Regression (k-NNR) algorithm and propose ways of extending the typical nonconformity measure used for regression so far. Unlike traditional regression methods which produce point predictions, Conformal Predictors output predictive regions that satisfy a given confidence level. The regions produced by any Conformal Predictor are automatically valid, however their tightness and therefore usefulness depends on the nonconformity measure used by each CP. In effect a nonconformity measure evaluates how strange a given example is compared to a set of other examples based on some traditional machine learning algorithm. We define six novel nonconformity measures based on the k-Nearest Neighbours Regression algorithm and develop the corresponding CPs following both the original (transductive) and the inductive CP approaches. A comparison of the predictive regions produced by our measures with those of the typical regression measure suggests that a major improvement in terms of predictive region tightness is achieved by the new measures.
LGOct 31, 2012
Venn-Abers predictorsVladimir Vovk, Ivan Petej
This paper continues study, both theoretical and empirical, of the method of Venn prediction, concentrating on binary prediction problems. Venn predictors produce probability-type predictions for the labels of test objects which are guaranteed to be well calibrated under the standard assumption that the observations are generated independently from the same distribution. We give a simple formalization and proof of this property. We also introduce Venn-Abers predictors, a new class of Venn predictors based on the idea of isotonic regression, and report promising empirical results both for Venn-Abers predictors and for their more computationally efficient simplified version.
LGSep 12, 2012
Conditional validity of inductive conformal predictorsVladimir Vovk
Conformal predictors are set predictors that are automatically valid in the sense of having coverage probability equal to or exceeding a given confidence level. Inductive conformal predictors are a computationally efficient version of conformal predictors satisfying the same property of validity. However, inductive conformal predictors have been only known to control unconditional coverage probability. This paper explores various versions of conditional validity and various ways to achieve them using inductive conformal predictors and their modifications.
MLAug 3, 2012
Cross-conformal predictorsVladimir Vovk
This note introduces the method of cross-conformal prediction, which is a hybrid of the methods of inductive conformal prediction and cross-validation, and studies its validity and predictive efficiency empirically.
LGJul 11, 2012
On-line Prediction with Kernels and the Complexity Approximation PrincipleAlex Gammerman, Yuri Kalnishkan, Vladimir Vovk
The paper describes an application of Aggregating Algorithm to the problem of regression. It generalizes earlier results concerned with plain linear regression to kernel techniques and presents an on-line algorithm which performs nearly as well as any oblivious kernel predictor. The paper contains the derivation of an estimate on the performance of this algorithm. The estimate is then used to derive an application of the Complexity Approximation Principle to kernel methods.
LGApr 15, 2012
Plug-in martingales for testing exchangeability on-lineValentina Fedorova, Alex Gammerman, Ilia Nouretdinov et al.
A standard assumption in machine learning is the exchangeability of data, which is equivalent to assuming that the examples are generated from the same probability distribution independently. This paper is devoted to testing the assumption of exchangeability on-line: the examples arrive one by one, and after receiving each example we would like to have a valid measure of the degree to which the assumption of exchangeability has been falsified. Such measures are provided by exchangeability martingales. We extend known techniques for constructing exchangeability martingales and show that our new method is competitive with the martingales introduced before. Finally we investigate the performance of our testing method on two benchmark datasets, USPS and Statlog Satellite data; for the former, the known techniques give satisfactory results, but for the latter our new more flexible method becomes necessary.