Danny Wood

LG
4papers
140citations
Novelty44%
AI Score25

4 Papers

LGJan 10, 2023
A Unified Theory of Diversity in Ensemble Learning

Danny Wood, Tingting Mu, Andrew Webb et al.

We present a theory of ensemble diversity, explaining the nature of diversity for a wide range of supervised learning scenarios. This challenge has been referred to as the holy grail of ensemble learning, an open research issue for over 30 years. Our framework reveals that diversity is in fact a hidden dimension in the bias-variance decomposition of the ensemble loss. We prove a family of exact bias-variance-diversity decompositions, for a wide range of losses in both regression and classification, e.g., squared, cross-entropy, and Poisson losses. For losses where an additive bias-variance decomposition is not available (e.g., 0/1 loss) we present an alternative approach: quantifying the effects of diversity, which turn out to be dependent on the label distribution. Overall, we argue that diversity is a measure of model fit, in precisely the same sense as bias and variance, but accounting for statistical dependencies between ensemble members. Thus, we should not be maximising diversity as so many works aim to do -- instead, we have a bias/variance/diversity trade-off to manage.

MLApr 26, 2022
Bias-Variance Decompositions for Margin Losses

Danny Wood, Tingting Mu, Gavin Brown

We introduce a novel bias-variance decomposition for a range of strictly convex margin losses, including the logistic loss (minimized by the classic LogitBoost algorithm), as well as the squared margin loss and canonical boosting loss. Furthermore, we show that, for all strictly convex margin losses, the expected risk decomposes into the risk of a "central" model and a term quantifying variation in the functional margin with respect to variations in the training data. These decompositions provide a diagnostic tool for practitioners to understand model overfitting/underfitting, and have implications for additive ensemble models -- for example, when our bias-variance decomposition holds, there is a corresponding "ambiguity" decomposition, which can be used to quantify model diversity.

MLOct 19, 2023
Model-agnostic variable importance for predictive uncertainty: an entropy-based approach

Danny Wood, Theodore Papamarkou, Matt Benatan et al.

In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only the reasons for the predictions themselves, but also the reasons for the model's level of confidence in those predictions. In this paper, we show how existing methods in explainability can be extended to uncertainty-aware models and how such extensions can be used to understand the sources of uncertainty in a model's predictive distribution. In particular, by adapting permutation feature importance, partial dependence plots, and individual conditional expectation plots, we demonstrate that novel insights into model behaviour may be obtained and that these methods can be used to measure the impact of features on both the entropy of the predictive distribution and the log-likelihood of the ground truth labels under that distribution. With experiments using both synthetic and real-world data, we demonstrate the utility of these approaches to understand both the sources of uncertainty and their impact on model performance.

LGJul 16, 2023
A max-affine spline approximation of neural networks using the Legendre transform of a convex-concave representation

Adam Perrett, Danny Wood, Gavin Brown

This work presents a novel algorithm for transforming a neural network into a spline representation. Unlike previous work that required convex and piecewise-affine network operators to create a max-affine spline alternate form, this work relaxes this constraint. The only constraint is that the function be bounded and possess a well-define second derivative, although this was shown experimentally to not be strictly necessary. It can also be performed over the whole network rather than on each layer independently. As in previous work, this bridges the gap between neural networks and approximation theory but also enables the visualisation of network feature maps. Mathematical proof and experimental investigation of the technique is performed with approximation error and feature maps being extracted from a range of architectures, including convolutional neural networks.