Model Callers for Transforming Predictive and Generative AI Applications
This addresses the challenge of managing and integrating AI models for developers and teams, though it appears incremental as an extension of existing model-serving frameworks.
The paper tackles the complexity of AI model deployment by introducing a 'model caller' abstraction that enhances accuracy, reduces latency, and streamlines system architectures, with a prototype Python library released for implementation.
We introduce a novel software abstraction termed "model caller," acting as an intermediary for AI and ML model calling, advocating its transformative utility beyond existing model-serving frameworks. This abstraction offers multiple advantages: enhanced accuracy and reduced latency in model predictions, superior monitoring and observability of models, more streamlined AI system architectures, simplified AI development and management processes, and improved collaboration and accountability across AI/ML/Data Science, software, data, and operations teams. Model callers are valuable for both creators and users of models within both predictive and generative AI applications. Additionally, we have developed and released a prototype Python library for model callers, accessible for installation via pip or for download from GitHub.