TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems
This work addresses the limitation of PIM systems in handling complex operations crucial for modern workloads like machine learning, offering a practical library for improved efficiency.
The paper tackles the problem of executing transcendental functions on processing-in-memory (PIM) systems, which are hardware-constrained, by introducing TransPimLib, a library that provides CORDIC-based and LUT-based methods, achieving up to 10.8x speedup and 99.9% accuracy in evaluations on workloads like Blackscholes and Sigmoid.
Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures support fairly limited instruction sets and struggle to execute complex operations such as transcendental functions and other hard-to-calculate operations (e.g., square root). These operations are particularly important for some modern workloads, e.g., activation functions in machine learning applications. In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc. We develop an implementation of TransPimLib for the UPMEM PIM architecture and perform a thorough evaluation of TransPimLib's methods in terms of performance and accuracy, using microbenchmarks and three full workloads (Blackscholes, Sigmoid, Softmax). We open-source all our code and datasets at~\url{https://github.com/CMU-SAFARI/transpimlib}.