Effect of Post-processing on Contextualized Word Representations
This work addresses the under-studied problem of optimizing contextualized embeddings for NLP applications, offering incremental improvements for researchers and practitioners.
The paper investigated whether post-processing techniques like normalization improve contextualized word embeddings from pre-trained language models, finding that methods such as z-score normalization enhance performance on lexical and sequence classification tasks.
Post-processing of static embedding has beenshown to improve their performance on both lexical and sequence-level tasks. However, post-processing for contextualized embeddings is an under-studied problem. In this work, we question the usefulness of post-processing for contextualized embeddings obtained from different layers of pre-trained language models. More specifically, we standardize individual neuron activations using z-score, min-max normalization, and by removing top principle components using the all-but-the-top method. Additionally, we apply unit length normalization to word representations. On a diverse set of pre-trained models, we show that post-processing unwraps vital information present in the representations for both lexical tasks (such as word similarity and analogy)and sequence classification tasks. Our findings raise interesting points in relation to theresearch studies that use contextualized representations, and suggest z-score normalization as an essential step to consider when using them in an application.