Scaling TensorFlow to 300 million predictions per second
This work addresses the problem of high-throughput, low-latency model serving for large-scale online advertising systems, representing an incremental improvement in deployment efficiency.
The paper tackled the challenge of scaling machine learning models within an online advertising ecosystem by transitioning to TensorFlow, achieving a result of 300 million predictions per second with low latency through optimization techniques.
We present the process of transitioning machine learning models to the TensorFlow framework at a large scale in an online advertising ecosystem. In this talk we address the key challenges we faced and describe how we successfully tackled them; notably, implementing the models in TF and serving them efficiently with low latency using various optimization techniques.