CLNov 13, 2018

Discourse in Multimedia: A Case Study in Information Extraction

arXiv:1811.05546v1
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of extracting structured information from multimedia texts for domain-specific applications like educational tools, representing an incremental improvement by applying existing discourse theories to new data types.

The paper tackled the problem of leveraging multimedia discourse features, such as text formatting and layout, to improve information extraction systems, specifically demonstrating that these features complement lexical semantics and can be used to harvest structured geometry knowledge from textbooks, resulting in a more accurate and explainable geometry problem solver.

To ensure readability, text is often written and presented with due formatting. These text formatting devices help the writer to effectively convey the narrative. At the same time, these help the readers pick up the structure of the discourse and comprehend the conveyed information. There have been a number of linguistic theories on discourse structure of text. However, these theories only consider unformatted text. Multimedia text contains rich formatting features which can be leveraged for various NLP tasks. In this paper, we study some of these discourse features in multimedia text and what communicative function they fulfil in the context. We examine how these multimedia discourse features can be used to improve an information extraction system. We show that the discourse and text layout features provide information that is complementary to lexical semantic information commonly used for information extraction. As a case study, we use these features to harvest structured subject knowledge of geometry from textbooks. We show that the harvested structured knowledge can be used to improve an existing solver for geometry problems, making it more accurate as well as more explainable.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes