Optimal N-ary ECOC Matrices for Ensemble Classification
This work addresses the need for more effective coding matrices in ensemble classification, offering a deterministic approach that can enhance accuracy for multi-class problems, though it is incremental as it builds on existing ECOC methods.
The paper tackles the problem of constructing optimal N-ary error-correcting output code (ECOC) matrices for ensemble classification by presenting a new recursive method that generalizes binary Hadamard matrices to prime N, yielding matrices with optimal minimum Hamming distance. Experimental results on six datasets show that using these deterministic matrices achieves comparable or higher accuracy than random matrices, especially when N is adaptively chosen to match dataset classes, reducing distance loss and improving performance.
A new recursive construction of $N$-ary error-correcting output code (ECOC) matrices for ensemble classification methods is presented, generalizing the classic doubling construction for binary Hadamard matrices. Given any prime integer $N$, this deterministic construction generates base-$N$ symmetric square matrices $M$ of prime-power dimension having optimal minimum Hamming distance between any two of its rows and columns. Experimental results for six datasets demonstrate that using these deterministic coding matrices for $N$-ary ECOC classification yields comparable and in many cases higher accuracy compared to using randomly generated coding matrices. This is particular true when $N$ is adaptively chosen so that the dimension of $M$ matches closely with the number of classes in a dataset, which reduces the loss in minimum Hamming distance when $M$ is truncated to fit the dataset. This is verified through a distance formula for $M$ which shows that these adaptive matrices have significantly higher minimum Hamming distance in comparison to randomly generated ones.