Preference-based performance measures for Time-Domain Global Similarity method
This work addresses a domain-specific issue for data cleaning practitioners by providing incremental improvements to performance evaluation in TDGS applications.
The paper tackles the mismatch between standard performance measures and practical preferences in data cleaning tasks using the Time-Domain Global Similarity (TDGS) method, deriving preference-based measures through probability theory to better align with task-specific correctness of data sequences.
For Time-Domain Global Similarity (TDGS) method, which transforms the data cleaning problem into a binary classification problem about the physical similarity between channels, directly adopting common performance measures could only guarantee the performance for physical similarity. Nevertheless, practical data cleaning tasks have preferences for the correctness of original data sequences. To obtain the general expressions of performance measures based on the preferences of tasks, the mapping relations between performance of TDGS method about physical similarity and correctness of data sequences are investigated by probability theory in this paper. Performance measures for TDGS method in several common data cleaning tasks are set. Cases when these preference-based performance measures could be simplified are introduced.