Abstractive Tabular Dataset Summarization via Knowledge Base Semantic Embeddings
This addresses the challenge of summarizing tabular data for users needing quick insights, but it is incremental as it builds on existing embedding and ontology methods.
The paper tackles the problem of generating abstractive summaries for tabular datasets by using knowledge base semantic embeddings to recommend and aggregate descriptive types from an ontology, with experiments on open data sources like OpenML, CKAN, and data.world showing effectiveness.
This paper describes an abstractive summarization method for tabular data which employs a knowledge base semantic embedding to generate the summary. Assuming the dataset contains descriptive text in headers, columns and/or some augmenting metadata, the system employs the embedding to recommend a subject/type for each text segment. Recommendations are aggregated into a small collection of super types considered to be descriptive of the dataset by exploiting the hierarchy of types in a pre-specified ontology. Using February 2015 Wikipedia as the knowledge base, and a corresponding DBpedia ontology as types, we present experimental results on open data taken from several sources--OpenML, CKAN and data.world--to illustrate the effectiveness of the approach.