Anfrage-getriebener Wissenstransfer zur Unterstuetzung von Datenanalysten
This addresses inefficiencies in data analysis workflows for data scientists in larger organizations, though it appears incremental as it builds on existing data management systems without altering workflows.
The paper tackles the problem of data scientists needing deep knowledge of heterogeneous data sources for writing analytical queries, by introducing a knowledge-sharing approach that extracts and formalizes collective knowledge from query logs to support data source discovery and incremental data integration.
In larger organizations, multiple teams of data scientists have to integrate data from heterogeneous data sources as preparation for data analysis tasks. Writing effective analytical queries requires data scientists to have in-depth knowledge of the existence, semantics, and usage context of data sources. Once gathered, such knowledge is informally shared within a specific team of data scientists, but usually is neither formalized nor shared with other teams. Potential synergies remain unused. We therefore introduce a novel approach which extends data management systems with additional knowledge-sharing capabilities to facilitate user collaboration without altering established data analysis workflows. Relevant collective knowledge from the query log is extracted to support data source discovery and incremental data integration. Extracted knowledge is formalized and provided at query time.