Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection
This work addresses resource efficiency in NLP for applications like mobile edge-devices, but it is incremental as it builds on existing dimension reduction techniques.
The paper tackles the problem of high-dimensional word embeddings increasing model size and resource demands, particularly for mobile edge-devices, by proposing WordFS, a weakly-supervised feature selection method that reduces dimensions while outperforming other methods at lower computational costs.
As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases, which can lead to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications. This paper explores word embedding dimension reduction. To balance computational costs and performance, we propose an efficient and effective weakly-supervised feature selection method named WordFS. It has two variants, each utilizing novel criteria for feature selection. Experiments on various tasks (e.g., word and sentence similarity and binary and multi-class classification) indicate that the proposed WordFS model outperforms other dimension reduction methods at lower computational costs. We have released the code for reproducibility along with the paper.