State of the Art in Text Classification for South Slavic Languages: Fine-Tuning or Prompting?
This work addresses text classification challenges for less-resourced South Slavic languages, providing a practical comparison for researchers and practitioners, though it is incremental as it applies existing methods to new data.
The paper tackled the problem of text classification for South Slavic languages by comparing fine-tuned BERT-like models with large language models (LLMs) in zero-shot setups, finding that LLMs often match or surpass fine-tuned models in performance but have drawbacks like slower inference and higher costs.
Until recently, fine-tuned BERT-like models provided state-of-the-art performance on text classification tasks. With the rise of instruction-tuned decoder-only models, commonly known as large language models (LLMs), the field has increasingly moved toward zero-shot and few-shot prompting. However, the performance of LLMs on text classification, particularly on less-resourced languages, remains under-explored. In this paper, we evaluate the performance of current language models on text classification tasks across several South Slavic languages. We compare openly available fine-tuned BERT-like models with a selection of open-source and closed-source LLMs across three tasks in three domains: sentiment classification in parliamentary speeches, topic classification in news articles and parliamentary speeches, and genre identification in web texts. Our results show that LLMs demonstrate strong zero-shot performance, often matching or surpassing fine-tuned BERT-like models. Moreover, when used in a zero-shot setup, LLMs perform comparably in South Slavic languages and English. However, we also point out key drawbacks of LLMs, including less predictable outputs, significantly slower inference, and higher computational costs. Due to these limitations, fine-tuned BERT-like models remain a more practical choice for large-scale automatic text annotation.