DBAIHCJun 13, 2018

Towards Semantically Enhanced Data Understanding

arXiv:1806.04952v11 citations
Originality Synthesis-oriented
AI Analysis

This addresses the issue of documentation overhead and unstructured data for data scientists, though it appears incremental as it builds on existing semantic modeling concepts.

The paper tackles the problem of data understanding in machine learning by proposing a methodology that uses a single semantic model to interlink data with its documentation, enabling direct lookup and browsing of connected information, and demonstrates an early prototype.

In the field of machine learning, data understanding is the practice of getting initial insights in unknown datasets. Such knowledge-intensive tasks require a lot of documentation, which is necessary for data scientists to grasp the meaning of the data. Usually, documentation is separate from the data in various external documents, diagrams, spreadsheets and tools which causes considerable look up overhead. Moreover, other supporting applications are not able to consume and utilize such unstructured data. That is why we propose a methodology that uses a single semantic model that interlinks data with its documentation. Hence, data scientists are able to directly look up the connected information about the data by simply following links. Equally, they can browse the documentation which always refers to the data. Furthermore, the model can be used by other approaches providing additional support, like searching, comparing, integrating or visualizing data. To showcase our approach we also demonstrate an early prototype.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes