HEP-EXIMCLDATA-ANFeb 20, 2023

Awkward to RDataFrame and back

arXiv:2302.09860v12 citationsh-index: 52
Originality Synthesis-oriented
AI Analysis

This work addresses interoperability issues for users in high-energy physics and data science who need to combine different analysis tools, though it is incremental as it builds on existing frameworks.

The paper tackled the problem of converting between Awkward Arrays and RDataFrame for scalable data analysis, resulting in zero-copy conversion functions (ak.to_rdataframe and ak.from_rdataframe) that enable flexible mixing of packages and languages without data duplication.

Awkward Arrays and RDataFrame provide two very different ways of performing calculations at scale. By adding the ability to zero-copy convert between them, users get the best of both. It gives users a better flexibility in mixing different packages and languages in their analysis. In Awkward Array version 2, the ak.to_rdataframe function presents a view of an Awkward Array as an RDataFrame source. This view is generated on demand and the data are not copied. The column readers are generated based on the run-time type of the views. The readers are passed to a generated source derived from ROOT::RDF::RDataSource. The ak.from_rdataframe function converts the selected columns as native Awkward Arrays. The details of the implementation exploiting JIT techniques are discussed. The examples of analysis of data stored in Awkward Arrays via a high-level interface of an RDataFrame are presented. A few examples of the column definition, applying user-defined filters written in C++, and plotting or extracting the columnar data as Awkward Arrays are shown. Current limitations and future plans are discussed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes