LGPFSep 20, 2021

Scaling TensorFlow to 300 million predictions per second

arXiv:2109.09541v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of high-throughput, low-latency model serving for large-scale online advertising systems, representing an incremental improvement in deployment efficiency.

The paper tackled the challenge of scaling machine learning models within an online advertising ecosystem by transitioning to TensorFlow, achieving a result of 300 million predictions per second with low latency through optimization techniques.

We present the process of transitioning machine learning models to the TensorFlow framework at a large scale in an online advertising ecosystem. In this talk we address the key challenges we faced and describe how we successfully tackled them; notably, implementing the models in TF and serving them efficiently with low latency using various optimization techniques.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes