SEAnet: A Deep Learning Architecture for Data Series Similarity Search
This work addresses a key bottleneck in data series analysis for applications like time-series databases, offering a novel deep learning-based solution to improve similarity search performance on challenging datasets.
The paper tackles the problem of similarity search in massive data series collections, where existing SAX-based indexes underperform on datasets with high-frequency, noisy, or weakly correlated properties, and proposes SEAnet, a deep learning architecture that learns Deep Embedding Approximations to provide high-quality summarizations and similarity search results, as verified by comprehensive experiments on 7 diverse datasets.
A key operation for massive data series collection analysis is similarity search. According to recent studies, SAX-based indexes offer state-of-the-art performance for similarity search tasks. However, their performance lags under high-frequency, weakly correlated, excessively noisy, or other dataset-specific properties. In this work, we propose Deep Embedding Approximation (DEA), a novel family of data series summarization techniques based on deep neural networks. Moreover, we describe SEAnet, a novel architecture especially designed for learning DEA, that introduces the Sum of Squares preservation property into the deep network design. We further enhance SEAnet with SEAtrans encoder. Finally, we propose novel sampling strategies, SEAsam and SEAsamE, that allow SEAnet to effectively train on massive datasets. Comprehensive experiments on 7 diverse synthetic and real datasets verify the advantages of DEA learned using SEAnet in providing high-quality data series summarizations and similarity search results.