Text classification using machine learning methods
This work addresses product classification for e-commerce or inventory management, but it is incremental as it applies existing methods without introducing new ones.
The paper tackled product classification by applying various word embedding and machine learning methods, achieving high accuracy with Support Vector Machines, Logistic Regression, and Random Forests, and identifying FASTTEXT as the best embedding technique.
In this paper we present the results of an experiment aimed to use machine learning methods to obtain models that can be used for the automatic classification of products. In order to apply automatic classification methods, we transformed the product names from a text representation to numeric vectors, a process called word embedding. We used several embedding methods: Count Vectorization, TF-IDF, Word2Vec, FASTTEXT, and GloVe. Having the product names in a form of numeric vectors, we proceeded with a set of machine learning methods for automatic classification: Logistic Regression, Multinomial Naive Bayes, kNN, Artificial Neural Networks, Support Vector Machines, and Decision trees with several variants. The results show an impressive accuracy of the classification process for Support Vector Machines, Logistic Regression, and Random Forests. Regarding the word embedding methods, the best results were obtained with the FASTTEXT technique.