Distributed Transfer Learning with 4th Gen Intel Xeon Processors
This work addresses the challenge of GPU dependency in training for researchers and practitioners, though it appears incremental as it applies existing methods to new hardware.
The paper tackles the problem of training deep learning models efficiently without relying primarily on GPUs by using 4th Gen Intel Xeon processors with AMX and distributed training, achieving near state-of-the-art accuracy for image classification on a TensorFlow dataset.
In this paper, we explore how transfer learning, coupled with Intel Xeon, specifically 4th Gen Intel Xeon scalable processor, defies the conventional belief that training is primarily GPU-dependent. We present a case study where we achieved near state-of-the-art accuracy for image classification on a publicly available Image Classification TensorFlow dataset using Intel Advanced Matrix Extensions(AMX) and distributed training with Horovod.