LGOct 16, 2023

Hamming Encoder: Mining Discriminative k-mers for Discrete Sequence Classification

arXiv:2310.10321v2h-index: 9
Originality Incremental advance
AI Analysis

This addresses challenges in discrete sequence classification for fields like bioinformatics, though it appears incremental as it builds on existing pattern-based and CNN methods.

The paper tackles the problem of missing discriminative feature combinations in pattern-based sequence classification by proposing Hamming Encoder, a binarized 1DCNN approach that mines discriminative k-mer sets using a Hamming distance-based measure, and it outperforms state-of-the-art methods in classification accuracy.

Sequence classification has numerous applications in various fields. Despite extensive studies in the last decades, many challenges still exist, particularly in pattern-based methods. Existing pattern-based methods measure the discriminative power of each feature individually during the mining process, leading to the result of missing some combinations of features with discriminative power. Furthermore, it is difficult to ensure the overall discriminative performance after converting sequences into feature vectors. To address these challenges, we propose a novel approach called Hamming Encoder, which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture to mine discriminative k-mer sets. In particular, we adopt a Hamming distance-based similarity measure to ensure consistency in the feature mining and classification procedure. Our method involves training an interpretable CNN encoder for sequential data and performing a gradient-based search for discriminative k-mer combinations. Experiments show that the Hamming Encoder method proposed in this paper outperforms existing state-of-the-art methods in terms of classification accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes