H. J. van den Herik

MLFeb 6, 2019

A Bayesian Approach for Accurate Classification-Based Aggregates

Q. A. Meertens, C. G. H. Diks, H. J. van den Herik et al.

In this paper, we study the accuracy of values aggregated over classes predicted by a classification algorithm. The problem is that the resulting aggregates (e.g., sums of a variable) are known to be biased. The bias can be large even for highly accurate classification algorithms, in particular when dealing with class-imbalanced data. To correct this bias, the algorithm's classification error rates have to be estimated. In this estimation, two issues arise when applying existing bias correction methods. First, inaccuracies in estimating classification error rates have to be taken into account. Second, impermissible estimates, such as a negative estimate for a positive value, have to be dismissed. We show that both issues are relevant in applications where the true labels are known only for a small set of data points. We propose a novel bias correction method using Bayesian inference. The novelty of our method is that it imposes constraints on the model parameters. We show that our method solves the problem of biased classification-based aggregates as well as the two issues above, in the general setting of multi-class classification. In the empirical evaluation, using a binary classifier on a real-world dataset of company tax returns, we show that our method outperforms existing methods in terms of mean squared error.

SCJul 30, 2012

Improving multivariate Horner schemes with Monte Carlo tree search

J. Kuipers, J. A. M. Vermaseren, A. Plaat et al.

Optimizing the cost of evaluating a polynomial is a classic problem in computer science. For polynomials in one variable, Horner's method provides a scheme for producing a computationally efficient form. For multivariate polynomials it is possible to generalize Horner's method, but this leaves freedom in the order of the variables. Traditionally, greedy schemes like most-occurring variable first are used. This simple textbook algorithm has given remarkably efficient results. Finding better algorithms has proved difficult. In trying to improve upon the greedy scheme we have implemented Monte Carlo tree search, a recent search method from the field of artificial intelligence. This results in better Horner schemes and reduces the cost of evaluating polynomials, sometimes by factors up to two.

H. J. van den Herik

2 Papers