DCDBLGNIAug 2, 2022

Smart caching in a Data Lake for High Energy Physics analysis

arXiv:2208.06437v14 citationsh-index: 100
Originality Synthesis-oriented
AI Analysis

This addresses data management challenges for High Energy Physics researchers in distributed environments, but it is an incremental application of existing methods to a specific domain.

The authors tackled the problem of data access and management in a distributed High Energy Physics Data Lake by proposing an autonomous reinforcement learning-based caching method, which improved user experience and reduced maintenance costs.

The continuous growth of data production in almost all scientific areas raises new problems in data access and management, especially in a scenario where the end-users, as well as the resources that they can access, are worldwide distributed. This work is focused on the data caching management in a Data Lake infrastructure in the context of the High Energy Physics field. We are proposing an autonomous method, based on Reinforcement Learning techniques, to improve the user experience and to contain the maintenance costs of the infrastructure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes