Endowing Language Models with Multimodal Knowledge Graph Representations
This work addresses parameter efficiency for natural language understanding models, but it is incremental as it builds on existing knowledge graph and retrieval methods.
The paper tackled the problem of making natural language understanding models more parameter-efficient by storing knowledge in an external multimodal knowledge graph and retrieving entities to improve downstream task performance, resulting in improvements of 0.3%–0.7% F1 on multilingual named entity recognition and up to 2.5% accuracy on visual sense disambiguation.
We propose a method to make natural language understanding models more parameter efficient by storing knowledge in an external knowledge graph (KG) and retrieving from this KG using a dense index. Given (possibly multilingual) downstream task data, e.g., sentences in German, we retrieve entities from the KG and use their multimodal representations to improve downstream task performance. We use the recently released VisualSem KG as our external knowledge repository, which covers a subset of Wikipedia and WordNet entities, and compare a mix of tuple-based and graph-based algorithms to learn entity and relation representations that are grounded on the KG multimodal information. We demonstrate the usefulness of the learned entity representations on two downstream tasks, and show improved performance on the multilingual named entity recognition task by $0.3\%$--$0.7\%$ F1, while we achieve up to $2.5\%$ improvement in accuracy on the visual sense disambiguation task. All our code and data are available in: \url{https://github.com/iacercalixto/visualsem-kg}.