Generalized Ambiguity Decomposition for Understanding Ensemble Diversity
This work addresses a foundational problem in machine learning for researchers and practitioners by offering a broad theoretical framework to explain ensemble diversity, though it is incremental as it extends prior restricted characterizations.
The authors tackled the problem of understanding the link between ensemble diversity and fusion performance by presenting a generalized ambiguity decomposition (GAD) theorem, which shows that ensemble performance approximately decomposes into average expert performance minus diversity for any convex ensemble and twice-differentiable loss function, providing a theoretical explanation for empirical benefits.
Diversity or complementarity of experts in ensemble pattern recognition and information processing systems is widely-observed by researchers to be crucial for achieving performance improvement upon fusion. Understanding this link between ensemble diversity and fusion performance is thus an important research question. However, prior works have theoretically characterized ensemble diversity and have linked it with ensemble performance in very restricted settings. We present a generalized ambiguity decomposition (GAD) theorem as a broad framework for answering these questions. The GAD theorem applies to a generic convex ensemble of experts for any arbitrary twice-differentiable loss function. It shows that the ensemble performance approximately decomposes into a difference of the average expert performance and the diversity of the ensemble. It thus provides a theoretical explanation for the empirically-observed benefit of fusing outputs from diverse classifiers and regressors. It also provides a loss function-dependent, ensemble-dependent, and data-dependent definition of diversity. We present extensions of this decomposition to common regression and classification loss functions, and report a simulation-based analysis of the diversity term and the accuracy of the decomposition. We finally present experiments on standard pattern recognition data sets which indicate the accuracy of the decomposition for real-world classification and regression problems.