LGNov 26, 2025

Ensemble Performance Through the Lens of Linear Independence of Classifier Votes in Data Streams

arXiv:2511.21465v1h-index: 4
Originality Incremental advance
AI Analysis

This work addresses computational efficiency and accuracy trade-offs in ensemble learning for data stream applications, representing an incremental advancement with theoretical insights.

The paper tackles the problem of determining optimal ensemble size in data streams by analyzing linear independence of classifier votes, showing that a theoretical estimate identifies performance saturation points, with validation on datasets like OzaBagging achieving accuracy gains of up to 5%.

Ensemble learning improves classification performance by combining multiple base classifiers. While increasing the number of classifiers generally enhances accuracy, excessively large ensembles can lead to computational inefficiency and diminishing returns. This paper investigates the relationship between ensemble size and performance through the lens of linear independence among classifier votes in data streams. We propose that ensembles composed of linearly independent classifiers maximize representational capacity, particularly under a geometric model. We then generalize the importance of linear independence to the weighted majority voting problem. By modeling the probability of achieving linear independence among classifier outputs, we derive a theoretical framework that explains the trade-off between ensemble size and accuracy. Our analysis leads to a theoretical estimate of the ensemble size required to achieve a user-specified probability of linear independence. We validate our theory through experiments on both real-world and synthetic datasets using two ensemble methods, OzaBagging and GOOWE. Our results confirm that this theoretical estimate effectively identifies the point of performance saturation for robust ensembles like OzaBagging. Conversely, for complex weighting schemes like GOOWE, our framework reveals that high theoretical diversity can trigger algorithmic instability. Our implementation is publicly available to support reproducibility and future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes