CLAug 26, 2022

AutoQGS: Auto-Prompt for Low-Resource Knowledge-based Question Generation from SPARQL

Guanming Xiong, Junwei Bao, Wen Zhao, Youzheng Wu, Xiaodong He

arXiv:2208.12461v12.313 citationsh-index: 72Has Code

Originality Incremental advance

AI Analysis

It addresses a bottleneck in question generation for complex queries in knowledge graphs, particularly in low-resource scenarios, with incremental improvements over existing methods.

This study tackles the problem of generating knowledge-based questions from SPARQL queries under low-resource conditions, proposing AutoQGS, an auto-prompt approach that rephrases SPARQL into natural language to leverage pre-trained models, achieving state-of-the-art performance on benchmarks like WebQuestionsSP and generating a 330k question-SPARQL corpus.

This study investigates the task of knowledge-based question generation (KBQG). Conventional KBQG works generated questions from fact triples in the knowledge graph, which could not express complex operations like aggregation and comparison in SPARQL. Moreover, due to the costly annotation of large-scale SPARQL-question pairs, KBQG from SPARQL under low-resource scenarios urgently needs to be explored. Recently, since the generative pre-trained language models (PLMs) typically trained in natural language (NL)-to-NL paradigm have been proven effective for low-resource generation, e.g., T5 and BART, how to effectively utilize them to generate NL-question from non-NL SPARQL is challenging. To address these challenges, AutoQGS, an auto-prompt approach for low-resource KBQG from SPARQL, is proposed. Firstly, we put forward to generate questions directly from SPARQL for the KBQG task to handle complex operations. Secondly, we propose an auto-prompter trained on large-scale unsupervised data to rephrase SPARQL into NL description, smoothing the low-resource transformation from non-NL SPARQL to NL question with PLMs. Experimental results on the WebQuestionsSP, ComlexWebQuestions 1.1, and PathQuestions show that our model achieves state-of-the-art performance, especially in low-resource settings. Furthermore, a corpus of 330k factoid complex question-SPARQL pairs is generated for further KBQG research.

View on arXiv PDF Code

Similar