Leveraging SIMD for Accelerating Large-number Arithmetic
This work provides a novel method to accelerate large-number arithmetic for scientific computing and cryptography, offering significant speedups over existing SIMD approaches.
DigitsOnTurbo (DoT) restructures large-number arithmetic to exploit SIMD parallelism, achieving up to 1.85x speedup for addition/subtraction and 2.3x for multiplication over prior SIMD implementations, and up to 19.3% throughput gains in scientific computing and 7.9% latency improvements in cryptography.
Large-number arithmetic, widely used in scientific computing and cryptography, has seen limited adoption of single instruction, multiple data (SIMD) parallelism on modern CPUs due to the inherent dependencies in traditional algorithms. We present DigitsOnTurbo (DoT), which restructures the computation around independent, data-parallel operations, rather than vectorizing the standard algorithms, thereby leveraging the benefits provided by SIMD. Over prior SIMD implementations, DoT achieves up to 1.85x speedups for addition and subtraction, and 2.3x for multiplication. When integrated into state-of-the-art libraries, DoT yields up to 4x speedup for addition and subtraction, and up to 2x speedup for multiplication, cascading into end-to-end throughput gains of up to 19.3% for scientific computations, and up to 7.9% latency and 5.9% throughput improvements on cryptographic implementations.