It's the Best Only When It Fits You Most: Finding Related Models for Serving Based on Dynamic Locality Sensitive Hashing
This addresses the bottleneck of manual model selection in platforms like TensorFlow Hub, offering an incremental improvement for researchers and practitioners in machine learning.
The paper tackles the problem of automatically finding reusable deep learning models for a target dataset by proposing a method based on dataset similarity, using Jensen-Shannon divergence and locality-sensitive hashing to achieve efficient and effective model searching.
In recent, deep learning has become the most popular direction in machine learning and artificial intelligence. However, preparation of training data is often a bottleneck in the lifecycle of deploying a deep learning model for production or research. Reusing models for inferencing a dataset can greatly save the human costs required for training data creation. Although there exist a number of model sharing platform such as TensorFlow Hub, PyTorch Hub, DLHub, most of these systems require model uploaders to manually specify the details of each model and model downloaders to screen keyword search results for selecting a model. They are in lack of an automatic model searching tool. This paper proposes an end-to-end process of searching related models for serving based on the similarity of the target dataset and the training datasets of the available models. While there exist many similarity measurements, we study how to efficiently apply these metrics without pair-wise comparison and compare the effectiveness of these metrics. We find that our proposed adaptivity measurement which is based on Jensen-Shannon (JS) divergence, is an effective measurement, and its computation can be significantly accelerated by using the technique of locality sensitive hashing.