CLMay 27, 2023

A Practical Toolkit for Multilingual Question and Answer Generation

Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados

arXiv:2305.17416v126.7229 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a practical toolkit for educators and developers needing multilingual QAG, but it is incremental as it builds on existing pre-trained models.

The authors tackled the challenge of limited accessible models for multilingual question and answer generation by introducing AutoQG, an online service, and lmqg, a Python package, with fine-tuned models in eight languages, enabling practitioners to generate structured question-answer pairs from text.

Generating questions along with associated answers from a text has applications in several domains, such as creating reading comprehension tests for students, or improving document search by providing auxiliary questions and answers based on the query. Training models for question and answer generation (QAG) is not straightforward due to the expected structured output (i.e. a list of question and answer pairs), as it requires more than generating a single sentence. This results in a small number of publicly accessible QAG models. In this paper, we introduce AutoQG, an online service for multilingual QAG, along with lmqg, an all-in-one Python package for model fine-tuning, generation, and evaluation. We also release QAG models in eight languages fine-tuned on a few variants of pre-trained encoder-decoder language models, which can be used online via AutoQG or locally via lmqg. With these resources, practitioners of any level can benefit from a toolkit that includes a web interface for end users, and easy-to-use code for developers who require custom models or fine-grained controls for generation.

View on arXiv PDF Code

Similar