IN2P3 Computing Center 2024 Workload Dataset
It provides a recent, large-scale, and diverse dataset for researchers to evaluate scheduling algorithms on real workloads over an extended period.
The paper presents a 2024 workload dataset from the IN2P3 Computing Center with 44M jobs from 1k users, including memory usage and a full-year span, enabling realistic scheduling simulations that capture seasonal and weekly patterns.
This paper provides and analyzes a dataset detailing the characteristics and execution data of all jobs submitted to the IN2P3 Computing Center (Villeurbanne, France), a national research and support unit of the CNRS, in 2024. The main additional value of this contribution compared to previously available datasets consists in the combination of an extended time interval considered, the inclusion of memory usage data and its recency, on top on improving the diversity of datasets provenance. This allows researchers to simulate and evaluate scheduling algorithms on a real workload over a large time window. Thus, specificities due to seasonal, monthly, and weekly user behaviors can be taken into account, which is not possible with smaller or synthetic datasets. It is composed of 44M jobs submitted by 1k users running on a cluster of a maximum of 312 machines supporting 46k concurrent threads and providing 105To of RAM.