CLOct 31, 2020

Method of the coherence evaluation of Ukrainian text

arXiv:2011.00310v13 citations

AI Analysis

This addresses automated text quality analysis for Ukrainian SEO applications, representing an incremental improvement to existing methods.

The paper tackles automated coherence assessment of Ukrainian text by analyzing existing methods and proposing an improved semantic similarity graph approach using neural network pre-training for sentence vectors. Experimental results on Ukrainian scientific articles identified the most effective method-parameter combination for coherence measurement tasks.

Due to the growing role of the SEO technologies, it is necessary to perform an automated analysis of the article's quality. Such approach helps both to return the most intelligible pages for the user's query and to raise the web sites positions to the top of query results. An automated assessment of a coherence is a part of the complex analysis of the text. In this article, main methods for text coherence measurements for Ukrainian language are analyzed. Expediency of using the semantic similarity graph method in comparison with other methods are explained. It is suggested the improvement of that method by the pre-training of the neural network for vector representations of sentences. Experimental examination of the original method and its modifications is made. Training and examination procedures are made on the corpus of Ukrainian texts, which were previously retrieved from abstracts and full texts of Ukrainian scientific articles. The testing procedure is implemented by performing of two typical tasks for the text coherence assessment: document discrimination task and insertion task. Accordingly to the analysis it is defined the most effective combination of method's modification and its parameter for the measurement of the text coherence.

View on arXiv PDF

Similar