DBAIJun 8, 2023

Learned spatial data partitioning

arXiv:2306.04846v24 citationsh-index: 21
AI Analysis

This addresses the challenge of efficiently processing increasingly large spatial datasets for applications like geographic analysis, though it is an incremental improvement over existing partitioning methods.

The paper tackles the problem of partitioning large spatial data for distributed processing by developing a novel deep reinforcement learning algorithm that optimizes data assignment to computers based on location. The method reduces workload run time by up to 59.4% for distance join queries in experiments using Apache Sedona and real-world data.

Due to the significant increase in the size of spatial data, it is essential to use distributed parallel processing systems to efficiently analyze spatial data. In this paper, we first study learned spatial data partitioning, which effectively assigns groups of big spatial data to computers based on locations of data by using machine learning techniques. We formalize spatial data partitioning in the context of reinforcement learning and develop a novel deep reinforcement learning algorithm. Our learning algorithm leverages features of spatial data partitioning and prunes ineffective learning processes to find optimal partitions efficiently. Our experimental study, which uses Apache Sedona and real-world spatial data, demonstrates that our method efficiently finds partitions for accelerating distance join queries and reduces the workload run time by up to 59.4%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes