Centroid estimation based on symmetric KL divergence for Multinomial text classification problem
This addresses text classification accuracy for researchers and practitioners, but appears incremental as it builds on existing centroid-based methods.
The paper tackles the problem of centroid estimation for multinomial text classification by introducing a method based on symmetric KL-divergence, resulting in substantial improvements over traditional classifiers on standard datasets.
We define a new method to estimate centroid for text classification based on the symmetric KL-divergence between the distribution of words in training documents and their class centroids. Experiments on several standard data sets indicate that the new method achieves substantial improvements over the traditional classifiers.