CLDec 4, 2021

A Russian Jeopardy! Data Set for Question-Answering Systems

arXiv:2112.02325v229.4584 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a domain-specific resource for NLP researchers and practitioners working on Russian QA, though it is incremental as it adapts an existing concept to a new language.

The authors tackled the lack of a large-scale Russian question-answering dataset by creating one with 379,284 Jeopardy!-like questions, including 29,375 from a Russian analogue, to support QA system development.

Question answering (QA) is one of the most common NLP tasks that relates to named entity recognition, fact extraction, semantic search and some other fields. In industry, it is much appreciated in chatbots and corporate information systems. It is also a challenging task that attracted the attention of a very general audience at the quiz show Jeopardy! In this article we describe a Jeopardy!-like Russian QA data set collected from the official Russian quiz database Chgk (che ge ka). The data set includes 379,284 quiz-like questions with 29,375 from the Russian analogue of Jeopardy! - "Own Game". We observe its linguistic features and the related QA-task. We conclude about perspectives of a QA competition based on the data set collected from this database.

View on arXiv PDF Code

Similar