CLIRFeb 12, 2022

A multi-task semi-supervised framework for Text2Graph & Graph2Text

arXiv:2202.06041v22 citations
Originality Incremental advance
AI Analysis

This work addresses information ingestion and retrieval problems for knowledge bases in AI applications, offering a domain-adaptable solution with incremental improvements in consistency.

The paper tackles the dual challenges of extracting graphs from text and generating text from graphs by proposing a multi-task semi-supervised framework based on a T5 architecture, which outperforms unsupervised state-of-the-art methods on the WebNLG dataset and shows greater consistency across domains than supervised models.

The Artificial Intelligence industry regularly develops applications that mostly rely on Knowledge Bases, a data repository about specific, or general, domains, usually represented in a graph shape. Similar to other databases, they face two main challenges: information ingestion and information retrieval. We approach these challenges by jointly learning graph extraction from text and text generation from graphs. The proposed solution, a T5 architecture, is trained in a multi-task semi-supervised environment, with our collected non-parallel data, following a cycle training regime. Experiments on WebNLG dataset show that our approach surpasses unsupervised state-of-the-art results in text-to-graph and graph-to-text. More relevantly, our framework is more consistent across seen and unseen domains than supervised models. The resulting model can be easily trained in any new domain with non-parallel data, by simply adding text and graphs about it, in our cycle framework.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes