CLFeb 26, 2023

Cross-Lingual Question Answering over Knowledge Base as Reading Comprehension

Chen Zhang, Yuxuan Lai, Yansong Feng, Xingyu Shen, Haowei Du, Dongyan Zhao

Peking U

arXiv:2302.13241v128.0267 citationsh-index: 52Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of incomplete multilingual support in knowledge bases for non-English languages, though it is incremental in applying existing techniques to a specific domain.

The paper tackles the problem of cross-lingual question answering over knowledge bases (xKBQA) by converting KB subgraphs into passages to leverage multilingual pre-trained language models and cross-lingual machine reading comprehension, achieving strong few-shot and zero-shot performance on datasets in 12 languages.

Although many large-scale knowledge bases (KBs) claim to contain multilingual information, their support for many non-English languages is often incomplete. This incompleteness gives birth to the task of cross-lingual question answering over knowledge base (xKBQA), which aims to answer questions in languages different from that of the provided KB. One of the major challenges facing xKBQA is the high cost of data annotation, leading to limited resources available for further exploration. Another challenge is mapping KB schemas and natural language expressions in the questions under cross-lingual settings. In this paper, we propose a novel approach for xKBQA in a reading comprehension paradigm. We convert KB subgraphs into passages to narrow the gap between KB schemas and questions, which enables our model to benefit from recent advances in multilingual pre-trained language models (MPLMs) and cross-lingual machine reading comprehension (xMRC). Specifically, we use MPLMs, with considerable knowledge of cross-lingual mappings, for cross-lingual reading comprehension. Existing high-quality xMRC datasets can be further utilized to finetune our model, greatly alleviating the data scarcity issue in xKBQA. Extensive experiments on two xKBQA datasets in 12 languages show that our approach outperforms various baselines and achieves strong few-shot and zero-shot performance. Our dataset and code are released for further research.

View on arXiv PDF Code

Similar