SEDBMay 19

OpenHealth Lake: Designing and testing a data lakehouse platform for health applications

arXiv:2605.1992246.4Has Code
AI Analysis

For health and bioinformatics organizations lacking unified data management, this prototype offers a flexible, open-source solution, but the contribution is incremental as it applies existing lakehouse architecture to a new domain.

OpenHealth Lake introduces a data lakehouse platform for health data management, demonstrating through a user study that the prototype is usable and useful for organizations with varying technical backgrounds.

Data management can be a complex challenge in fields such as bioinformatics and health sciences, which continuously generate extensive heterogeneous datasets. In the context of collaborative global health initiatives, secure storage and sharing of data are crucial to support impactful research. However, the absence of a unified data management platform complicates efficient data exchange and governance within these initiatives. In this paper, we introduce the design process of OpenHealth Lake, a data management prototype platform based on a data lakehouse architecture, data federation, and the FAIR principles. The platform is designed using open-source tools, guided by system requirements identified in previously published studies and complemented by insights from the existing literature. The current prototype platform comprises a user-friendly website, an open API, Python and R packages, allowing users to interact with the platform in multiple ways. Through a user study that included participants with varying technical backgrounds, we showed that our proposed data management prototype is both usable and useful. Our prototype design showcases the adaptability, scalability, and reproducibility of a lakehouse system that can be used by any organisation. It is designed as a flexible and complementary approach that allows organisations to customise data management systems to their specific requirements and resources, including cloud-based or self-hosted storage choices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes