MLLGAug 11, 2016

Sequence Graph Transform (SGT): A Feature Embedding Function for Sequence Data Mining

arXiv:1608.03533v1529 citations
Originality Highly original
AI Analysis

This addresses the problem of efficient sequence feature embedding for data mining, offering a novel solution with broad applicability.

The paper tackles the challenge of embedding sequence data by proposing Sequence Graph Transform (SGT), which extracts short- to long-term dependencies without computational increase, resulting in significantly superior accuracy and lower computation in clustering and classification compared to existing methods like LSTM and sequence kernels.

Sequence feature embedding is a challenging task due to the unstructuredness of sequence, i.e., arbitrary strings of arbitrary length. Existing methods are efficient in extracting short-term dependencies but typically suffer from computation issues for the long-term. Sequence Graph Transform (SGT), a feature embedding function, that can extract a varying amount of short- to long-term dependencies without increasing the computation is proposed. SGT's properties are analytically proved for interpretation under normal and uniform distribution assumptions. SGT features yield significantly superior results in sequence clustering and classification with higher accuracy and lower computation as compared to the existing methods, including the state-of-the-art sequence/string Kernels and LSTM.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes