AIDCMay 8, 2017

Block-Parallel IDA* for GPUs (Extended Manuscript)

arXiv:1705.02843v11 citations
Originality Incremental advance
AI Analysis

This work addresses performance bottlenecks in GPU-based search algorithms for researchers and practitioners in parallel computing, but it is incremental as it builds on existing IDA* methods.

The paper tackled the problem of poor GPU parallelization of Iterative-Deepening A* (IDA*) due to warp divergence and load imbalance, and proposed Block-Parallel IDA* (BPIDA*), which achieved a speedup of 4.98 compared to a sequential implementation on the 15-puzzle.

We investigate GPU-based parallelization of Iterative-Deepening A* (IDA*). We show that straightforward thread-based parallelization techniques which were previously proposed for massively parallel SIMD processors perform poorly due to warp divergence and load imbalance. We propose Block-Parallel IDA* (BPIDA*), which assigns the search of a subtree to a block (a group of threads with access to fast shared memory) rather than a thread. On the 15-puzzle, BPIDA* on a NVIDIA GRID K520 with 1536 CUDA cores achieves a speedup of 4.98 compared to a highly optimized sequential IDA* implementation on a Xeon E5-2670 core.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes