DCAIAug 13, 2021

HPTMT Parallel Operators for High Performance Data Science & Data Engineering

arXiv:2108.06001v1
Originality Synthesis-oriented
AI Analysis

This addresses the problem of fragmented implementations in data-intensive fields like data engineering and deep learning, though it appears incremental as it builds on existing concepts.

The paper tackles the lack of standardized data structures and operators in data-intensive applications by proposing the HPTMT architecture, which integrates data engineering and data science efficiently, as demonstrated through an end-to-end application with deep learning and data engineering components.

Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definition of data structures and operators in the field has led to other implementations that do not work well together. The HPTMT architecture that we proposed recently, identifies a set of data structures, operators, and an execution model for creating rich data applications that links all aspects of data engineering and data science together efficiently. This paper elaborates and illustrates this architecture using an end-to-end application with deep learning and data engineering parts working together.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes