Viacheslav Shibaev

CL
3papers
2,058citations
Novelty30%
AI Score22

3 Papers

CLApr 10, 2020
Style-transfer and Paraphrase: Looking for a Sensible Semantic Similarity Metric

Ivan P. Yamshchikov, Viacheslav Shibaev, Nikolay Khlebnikov et al.

The rapid development of such natural language processing tasks as style transfer, paraphrase, and machine translation often calls for the use of semantic similarity metrics. In recent years a lot of methods to measure the semantic similarity of two short texts were developed. This paper provides a comprehensive analysis for more than a dozen of such methods. Using a new dataset of fourteen thousand sentence pairs human-labeled according to their semantic similarity, we demonstrate that none of the metrics widely used in the literature is close enough to human judgment in these tasks. A number of recently proposed metrics provide comparable results, yet Word Mover Distance is shown to be the most reasonable solution to measure semantic similarity in reformulated texts at the moment.

CLSep 26, 2019
Decomposing Textual Information For Style Transfer

Ivan P. Yamshchikov, Viacheslav Shibaev, Aleksander Nagaev et al.

This paper focuses on latent representations that could effectively decompose different aspects of textual information. Using a framework of style transfer for texts, we propose several empirical methods to assess information decomposition quality. We validate these methods with several state-of-the-art textual style transfer methods. Higher quality of information decomposition corresponds to higher performance in terms of bilingual evaluation understudy (BLEU) between output and human-written reformulations.

CLAug 19, 2019
Style Transfer for Texts: Retrain, Report Errors, Compare with Rewrites

Alexey Tikhonov, Viacheslav Shibaev, Aleksander Nagaev et al.

This paper shows that standard assessment methodology for style transfer has several significant problems. First, the standard metrics for style accuracy and semantics preservation vary significantly on different re-runs. Therefore one has to report error margins for the obtained results. Second, starting with certain values of bilingual evaluation understudy (BLEU) between input and output and accuracy of the sentiment transfer the optimization of these two standard metrics diverge from the intuitive goal of the style transfer task. Finally, due to the nature of the task itself, there is a specific dependence between these two metrics that could be easily manipulated. Under these circumstances, we suggest taking BLEU between input and human-written reformulations into consideration for benchmarks. We also propose three new architectures that outperform state of the art in terms of this metric.