RusLICA: A Russian-Language Platform for Automated Linguistic Inquiry and Category Analysis
This work addresses the need for psycholinguistic analysis tools tailored to Russian, considering its grammatical and cultural specificities, though it is incremental as it adapts an existing methodology to a new language.
The researchers tackled the adaptation of the Linguistic Inquiry and Word Count (LIWC) methodology for Russian-language texts, resulting in a platform called RusLICA that includes 96 categories and integrates features like syntactic analysis and pre-trained language models for automated linguistic inquiry.
Defining psycholinguistic characteristics in written texts is a task gaining increasing attention from researchers. One of the most widely used tools in the current field is Linguistic Inquiry and Word Count (LIWC) that originally was developed to analyze English texts and translated into multiple languages. Our approach offers the adaptation of LIWC methodology for the Russian language, considering its grammatical and cultural specificities. The suggested approach comprises 96 categories, integrating syntactic, morphological, lexical, general statistical features, and results of predictions obtained using pre-trained language models (LMs) for text analysis. Rather than applying direct translation to existing thesauri, we built the dictionary specifically for the Russian language based on the content from several lexicographic resources, semantic dictionaries and corpora. The paper describes the process of mapping lemmas to 42 psycholinguistic categories and the implementation of the analyzer as part of RusLICA web service.