Michal Gallus

2papers

2 Papers

DCSep 23, 2020
Applying the Roofline model for Deep Learning performance optimizations

Jacek Czaja, Michal Gallus, Joanna Wozna et al.

In this paper We present a methodology for creating Roofline models automatically for Non-Unified Memory Access (NUMA) using Intel Xeon as an example. Finally, we present an evaluation of highly efficient deep learning primitives as implemented in the Intel oneDNN Library.

MSApr 28, 2019
Softmax Optimizations for Intel Xeon Processor-based Platforms

Jacek Czaja, Michal Gallus, Tomasz Patejko et al.

Softmax is popular normalization method used in machine learning. Deep learning solutions like Transformer or BERT use the softmax function intensively, so it is worthwhile to optimize its performance. This article presents our methodology of optimization and its results applied to softmax. By presenting this methodology, we hope to increase an interest in deep learning optimizations for CPUs. We believe that the optimization process presented here could be transferred to other deep learning frameworks such as TensorFlow or PyTorch.