HC AI LGApr 3, 2024

Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference

Fred Hohman, Chaoqun Wang, Jinmook Lee, Jochen Görtler, Dominik Moritz, Jeffrey P Bigham, Zhile Ren, Cecile Foret, Qi Shan, Xiaoyi Zhang

AppleCMU

arXiv:2404.03085v112.010 citationsh-index: 27CHI

Originality Incremental advance

AI Analysis

This addresses the problem for practitioners needing to balance hardware constraints like model size and latency in on-device ML, but it is incremental as it builds on existing optimization tools with a new interactive system.

The paper tackles the challenge of optimizing machine learning models for efficient on-device inference by introducing Talaria, a system that allows practitioners to interactively visualize and simulate optimizations, resulting in deployment to over 800 practitioners who submitted more than 3,600 models.

On-device machine learning (ML) moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models on devices with limited resources presents a major technical challenge: practitioners need to optimize models and balance hardware metrics such as model size, latency, and power. To help practitioners create efficient ML models, we designed and developed Talaria: a model visualization and optimization system. Talaria enables practitioners to compile models to hardware, interactively visualize model statistics, and simulate optimizations to test the impact on inference metrics. Since its internal deployment two years ago, we have evaluated Talaria using three methodologies: (1) a log analysis highlighting its growth of 800+ practitioners submitting 3,600+ models; (2) a usability survey with 26 users assessing the utility of 20 Talaria features; and (3) a qualitative interview with the 7 most active users about their experience using Talaria.

View on arXiv PDF

Similar