Locality-Aware Laplacian Mesh Smoothing
For practitioners of mesh smoothing, this simple reordering technique significantly improves performance on multicore systems by minimizing cache misses.
The paper proposes a vertex reordering scheme for Laplacian mesh smoothing that reduces cache misses, achieving a 75x speedup on 32 cores over single-core execution and a 32% improvement over state-of-the-art reordering on 32 cores.
In this paper, we propose a novel reordering scheme to improve the performance of a Laplacian Mesh Smoothing (LMS). While the Laplacian smoothing algorithm is well optimized and studied, we show how a simple reordering of the vertices of the mesh can greatly improve the execution time of the smoothing algorithm. The idea of our reordering is based on (i) the postulate that cache misses are a very time consuming part of the execution of LMS, and (ii) the study of the reuse distance patterns of various executions of the LMS algorithm. Our reordering algorithm is very simple but allows for huge performance improvement. We ran it on a Westmere-EX platform and obtained a speedup of 75 on 32 cores compared to the single core execution without reordering, and a gain in execution of 32% on 32 cores compared to state of the art reordering. Finally, we show that we leave little room for a better ordering by reducing the L2 and L3 cache misses to a bare minimum.