HiQ -- A Declarative, Non-intrusive, Dynamic and Transparent Observability and Optimization System
This addresses the need for efficient runtime monitoring and optimization in Python-based systems, particularly for large deep neural networks, but it appears incremental as it builds on existing observability concepts.
The paper tackles the problem of tracking Python program runtime information without performance loss by proposing HiQ, a non-intrusive, declarative, dynamic, and transparent observability and optimization system, which has been implemented and open-sourced for use in deep learning model life cycle management.
This paper proposes a non-intrusive, declarative, dynamic and transparent system called `HiQ` to track Python program runtime information without compromising on the run-time system performance and losing insight. HiQ can be used for monolithic and distributed systems, offline and online applications. HiQ is developed when we optimize our large deep neural network (DNN) models which are written in Python, but it can be generalized to any Python program or distributed system, or even other languages like Java. We have implemented the system and adopted it in our deep learning model life cycle management system to catch the bottleneck while keeping our production code clean and highly performant. The implementation is open-sourced at: [https://github.com/oracle/hiq](https://github.com/oracle/hiq).