CLLGFeb 27, 2025

Text classification using machine learning methods

arXiv:2502.19801v12 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses product classification for e-commerce or inventory management, but it is incremental as it applies existing methods without introducing new ones.

The paper tackled product classification by applying various word embedding and machine learning methods, achieving high accuracy with Support Vector Machines, Logistic Regression, and Random Forests, and identifying FASTTEXT as the best embedding technique.

In this paper we present the results of an experiment aimed to use machine learning methods to obtain models that can be used for the automatic classification of products. In order to apply automatic classification methods, we transformed the product names from a text representation to numeric vectors, a process called word embedding. We used several embedding methods: Count Vectorization, TF-IDF, Word2Vec, FASTTEXT, and GloVe. Having the product names in a form of numeric vectors, we proceeded with a set of machine learning methods for automatic classification: Logistic Regression, Multinomial Naive Bayes, kNN, Artificial Neural Networks, Support Vector Machines, and Decision trees with several variants. The results show an impressive accuracy of the classification process for Support Vector Machines, Logistic Regression, and Random Forests. Regarding the word embedding methods, the best results were obtained with the FASTTEXT technique.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes