CLMay 13, 2024

Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers

arXiv:2405.07886v12.72 citationsh-index: 3Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the lack of multimodal datasets for Russian-language scientific paper summarization, but it is incremental as it applies existing methods to new data.

The authors created a multimodal dataset of 420 Russian-language scientific papers, including texts, tables, and figures, and tested existing language models like Gigachat and YandexGPT for automatic summarization, but no concrete performance numbers were provided in the abstract.

The paper discusses the creation of a multimodal dataset of Russian-language scientific papers and testing of existing language models for the task of automatic text summarization. A feature of the dataset is its multimodal data, which includes texts, tables and figures. The paper presents the results of experiments with two language models: Gigachat from SBER and YandexGPT from Yandex. The dataset consists of 420 papers and is publicly available on https://github.com/iis-research-team/summarization-dataset.

View on arXiv PDF Code

Similar