On a Generalization of the Average Distance Classifier
This work addresses a specific limitation in classification for high-dimensional data, offering an incremental improvement over existing methods.
The authors tackled the problem of poor performance of the average distance classifier in high-dimensional, low-sample-size settings when populations differ beyond just location and scale, by proposing simple transformations that improve discrimination, achieving good performance even with identical location and scale.
In high dimension, low sample size (HDLSS)settings, the simple average distance classifier based on the Euclidean distance performs poorly if differences between the locations get masked by the scale differences. To rectify this issue, modifications to the average distance classifier was proposed by Chan and Hall (2009). However, the existing classifiers cannot discriminate when the populations differ in other aspects than locations and scales. In this article, we propose some simple transformations of the average distance classifier to tackle this issue. The resulting classifiers perform quite well even when the underlying populations have the same location and scale. The high-dimensional behaviour of the proposed classifiers is studied theoretically. Numerical experiments with a variety of simulated as well as real data sets exhibit the usefulness of the proposed methodology.