Softmax Optimizations for Intel Xeon Processor-based Platforms
This work addresses computational efficiency for deep learning practitioners using CPU-based platforms, though it is incremental as it focuses on optimizing an existing function.
The authors tackled the performance bottleneck of the softmax function in deep learning models like Transformer and BERT on Intel Xeon CPUs, achieving optimized results through a methodology that could be applied to other frameworks.
Softmax is popular normalization method used in machine learning. Deep learning solutions like Transformer or BERT use the softmax function intensively, so it is worthwhile to optimize its performance. This article presents our methodology of optimization and its results applied to softmax. By presenting this methodology, we hope to increase an interest in deep learning optimizations for CPUs. We believe that the optimization process presented here could be transferred to other deep learning frameworks such as TensorFlow or PyTorch.