Self-Organizing Maps with Variable Input Length for Motif Discovery and Word Segmentation
This work addresses motif discovery and word segmentation, which is important for fields like early language acquisition, but it appears incremental as it builds on existing Self-Organizing Maps with modifications for variable input length.
The authors tackled the problem of time series motif discovery and word segmentation by proposing VILMAP, a model based on Self-Organizing Maps that identifies motifs of different lengths. The results show that VILMAP performs well on a standard motif discovery dataset, avoids catastrophic forgetting with increasing input sizes, and achieves similar or superior results to other methods in word segmentation.
Time Series Motif Discovery (TSMD) is defined as searching for patterns that are previously unknown and appear with a given frequency in time series. Another problem strongly related with TSMD is Word Segmentation. This problem has received much attention from the community that studies early language acquisition in babies and toddlers. The development of biologically plausible models for word segmentation could greatly advance this field. Therefore, in this article, we propose the Variable Input Length Map (VILMAP) for Motif Discovery and Word Segmentation. The model is based on the Self-Organizing Maps and can identify Motifs with different lengths in time series. In our experiments, we show that VILMAP presents good results in finding Motifs in a standard Motif discovery dataset and can avoid catastrophic forgetting when trained with datasets with increasing values of input size. We also show that VILMAP achieves results similar or superior to other methods in the literature developed for the task of word segmentation.