Acceleration of multi-component multiple-precision arithmetic with branch-free algorithms and SIMD vectorization
This work addresses performance bottlenecks in high-precision computing for scientific and engineering applications, but it appears incremental as it builds on existing branch-free algorithms and SIMD techniques.
The paper tackled the problem of accelerating multi-component multiple-precision arithmetic, achieving benchmark results on x86 and ARM CPU platforms to quantify accelerations in linear computations and polynomial evaluation.
Multiple-precision floating-point branch-free algorithms can significantly accelerate multi-component arithmetic implemented by combining hardware-based binary64 and binary32, particularly for triple- and quadruple-precision computations. In this study, we achieved benchmark results on x86 and ARM CPU platforms to quantify the accelerations achieved in linear computations and polynomial evaluation by integrating these algorithms.