Federated Data Science to Break Down Silos [Vision]
This vision paper addresses the challenge of breaking down data silos for the data science community, though it is incremental as it builds on existing sharing initiatives.
The paper tackles the problem of finding and combining semantically related data science artifacts across platforms by proposing KEK, an open federated data science platform that enables efficient search and integration of pipelines and metadata.
Similar to Open Data initiatives, data science as a community has launched initiatives for sharing not only data but entire pipelines, derivatives, artifacts, etc. (Open Data Science). However, the few efforts that exist focus on the technical part on how to facilitate sharing, conversion, etc. This vision paper goes a step further and proposes KEK, an open federated data science platform that does not only allow for sharing data science pipelines and their (meta)data but also provides methods for efficient search and, in the ideal case, even allows for combining and defining pipelines across platforms in a federated manner. In doing so, KEK addresses the so far neglected challenge of actually finding artifacts that are semantically related and that can be combined to achieve a certain goal.