Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction
This addresses a theoretical gap in machine learning for researchers, but it is incremental as it builds on existing comparisons of mixture of experts and Bayesian methods.
The paper tackles the problem of why mixture of experts models outperform Bayesian methods despite weaker inductive guarantees, arguing it's due to greater functional capacity, and proves this in a limiting case with experimental validation.
Mixture of experts is a prediction aggregation method in machine learning that aggregates the predictions of specialized experts. This method often outperforms Bayesian methods despite the Bayesian having stronger inductive guarantees. We argue that this is due to the greater functional capacity of mixture of experts. We prove that in a limiting case of mixture of experts will have greater capacity than equivalent Bayesian methods, which we vouchsafe through experiments on non-limiting cases. Finally, we conclude that mixture of experts is a type of abductive reasoning in the Peircian sense of hypothesis construction.