CLJul 17, 2024

Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

arXiv:2407.12342v21 citationsh-index: 12
AI Analysis

This work addresses resource efficiency in NLP for applications like mobile edge-devices, but it is incremental as it builds on existing dimension reduction techniques.

The paper tackles the problem of high-dimensional word embeddings increasing model size and resource demands, particularly for mobile edge-devices, by proposing WordFS, a weakly-supervised feature selection method that reduces dimensions while outperforming other methods at lower computational costs.

As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases, which can lead to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications. This paper explores word embedding dimension reduction. To balance computational costs and performance, we propose an efficient and effective weakly-supervised feature selection method named WordFS. It has two variants, each utilizing novel criteria for feature selection. Experiments on various tasks (e.g., word and sentence similarity and binary and multi-class classification) indicate that the proposed WordFS model outperforms other dimension reduction methods at lower computational costs. We have released the code for reproducibility along with the paper.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes