NEAIJan 4, 2024

k-Winners-Take-All Ensemble Neural Network

arXiv:2401.02092v11 citationsh-index: 11ICONIP
Originality Incremental advance
AI Analysis

This work addresses performance enhancement in neural network ensembles for machine learning practitioners, but it is incremental as it builds on existing ensemble and mixture-of-experts approaches.

The paper tackles the problem of improving neural network performance by proposing a kWTA ensemble neural network (kWTA-ENN) that uses a k-Winners-Take-All activation function to combine sub-networks, resulting in better test accuracies, such as 98.34% on MNIST and 88.06% on Fashion-MNIST, compared to baseline models like cooperative ensemble and mixture-of-experts.

Ensembling is one approach that improves the performance of a neural network by combining a number of independent neural networks, usually by either averaging or summing up their individual outputs. We modify this ensembling approach by training the sub-networks concurrently instead of independently. This concurrent training of sub-networks leads them to cooperate with each other, and we refer to them as "cooperative ensemble". Meanwhile, the mixture-of-experts approach improves a neural network performance by dividing up a given dataset to its sub-networks. It then uses a gating network that assigns a specialization to each of its sub-networks called "experts". We improve on these aforementioned ways for combining a group of neural networks by using a k-Winners-Take-All (kWTA) activation function, that acts as the combination method for the outputs of each sub-network in the ensemble. We refer to this proposed model as "kWTA ensemble neural networks" (kWTA-ENN). With the kWTA activation function, the losing neurons of the sub-networks are inhibited while the winning neurons are retained. This results in sub-networks having some form of specialization but also sharing knowledge with one another. We compare our approach with the cooperative ensemble and mixture-of-experts, where we used a feed-forward neural network with one hidden layer having 100 neurons as the sub-network architecture. Our approach yields a better performance compared to the baseline models, reaching the following test accuracies on benchmark datasets: 98.34% on MNIST, 88.06% on Fashion-MNIST, 91.56% on KMNIST, and 95.97% on WDBC.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes