Word Embeddings for Chemical Patent Natural Language Processing
This work addresses the challenge of processing chemical patent text for researchers and practitioners in computational chemistry and intellectual property, but it is incremental as it builds on existing embedding methods.
The authors tackled the problem of natural language processing for chemical patents by evaluating domain-specific word embeddings, showing that chemical patent embeddings outperform biomedical embeddings both extrinsically and intrinsically, and demonstrated that contextualized embeddings enable predictive models with reasonable performance on a small gold standard dataset.
We evaluate chemical patent word embeddings against known biomedical embeddings and show that they outperform the latter extrinsically and intrinsically. We also show that using contextualized embeddings can induce predictive models of reasonable performance for this domain over a relatively small gold standard.