Interpretable Categorization of Heterogeneous Time Series Data
This work addresses the challenge of interpretable modeling for heterogeneous time series data in domains like smart homes and aviation, offering a novel method but with incremental improvements over existing decision tree approaches.
The authors tackled the problem of learning interpretable models for heterogeneous multivariate time series data, proposing grammar-based decision trees (GBDTs) that extend decision trees with a grammar framework to support diverse data types while maintaining interpretability, and applied them to datasets like Australian Sign Language and near mid-air collisions, showing effectiveness in classification and categorization tasks.
Understanding heterogeneous multivariate time series data is important in many applications ranging from smart homes to aviation. Learning models of heterogeneous multivariate time series that are also human-interpretable is challenging and not adequately addressed by the existing literature. We propose grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs extend decision trees with a grammar framework. Logical expressions derived from a context-free grammar are used for branching in place of simple thresholds on attributes. The added expressivity enables support for a wide range of data types while retaining the interpretability of decision trees. In particular, when a grammar based on temporal logic is used, we show that GBDTs can be used for the interpretable classi cation of high-dimensional and heterogeneous time series data. Furthermore, we show how GBDTs can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply GBDTs to analyze the classic Australian Sign Language dataset as well as data on near mid-air collisions (NMACs). The NMAC data comes from aircraft simulations used in the development of the next-generation Airborne Collision Avoidance System (ACAS X).