CL LGAug 21, 2020

Turkish Text Classification: From Lexicon Analysis to Bidirectional Transformer

arXiv:2104.11642v12 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of text classification for Turkish language users, providing a domain-independent solution, but it is incremental as it applies existing methods to a new language.

The paper tackled Turkish text classification by evaluating lexicon analysis, SVMs, and XGBoost, and proposed a pretrained transformer classifier that outperformed previous methods for this task, achieving superior results in sentiment analysis and classification.

Text classification has seen an increased use in both academic and industry settings. Though rule based methods have been fairly successful, supervised machine learning has been shown to be most successful for most languages, where most research was done on English. In this article, the success of lexicon analysis, support vector machines, and extreme gradient boosting for the task of text classification and sentiment analysis are evaluated in Turkish and a pretrained transformer based classifier is proposed, outperforming previous methods for Turkish text classification. In the context of text classification, all machine learning models proposed in the article are domain-independent and do not require any task-specific modifications.

View on arXiv PDF

Similar