LG MLMay 14, 2020

A Weighted Mutual k-Nearest Neighbour for Classification Mining

Joydip Dhar, Ashaya Shukla, Mukul Kumar, Prashant Gupta

arXiv:2005.08640v13.38 citations

Originality Synthesis-oriented

AI Analysis

This work addresses noise reduction in classification for large-scale databases, but it is incremental as it builds on existing kNN methods with minor modifications.

The paper tackles the problem of noise and pseudo neighbors in kNN classification by proposing a new algorithm that performs anomaly detection and removal, and uses distance-weighted voting to minimize the effect of distant neighbors. The result is a refined dataset that provides comparatively better classification performance, as measured by a certainty measure in experiments.

kNN is a very effective Instance based learning method, and it is easy to implement. Due to heterogeneous nature of data, noises from different possible sources are also widespread in nature especially in case of large-scale databases. For noise elimination and effect of pseudo neighbours, in this paper, we propose a new learning algorithm which performs the task of anomaly detection and removal of pseudo neighbours from the dataset so as to provide comparative better results. This algorithm also tries to minimize effect of those neighbours which are distant. A concept of certainty measure is also introduced for experimental results. The advantage of using concept of mutual neighbours and distance-weighted voting is that, dataset will be refined after removal of anomaly and weightage concept compels to take into account more consideration of those neighbours, which are closer. Consequently, finally the performance of proposed algorithm is calculated.

View on arXiv PDF

Similar