An Intellectual Property Entity Recognition Method Based on Transformer and Technological Word Information
This work addresses the challenge of extracting intellectual property entities from patent texts for researchers, but it appears incremental as it builds on existing Transformer and BERT methods with added technical word features.
The paper tackled the problem of named entity recognition in patent texts by proposing a method that combines Transformer, BERT, and technical word information extracted by IDCNN, resulting in improved accuracy on public and annotated patent datasets.
Patent texts contain a large amount of entity information. Through named entity recognition, intellectual property entity information containing key information can be extracted from it, helping researchers to understand the patent content faster. Therefore, it is difficult for existing named entity extraction methods to make full use of the semantic information at the word level brought about by professional vocabulary changes. This paper proposes a method for extracting intellectual property entities based on Transformer and technical word information , and provides accurate word vector representation in combination with the BERT language method. In the process of word vector generation, the technical word information extracted by IDCNN is added to improve the understanding of intellectual property entities Representation ability. Finally, the Transformer encoder that introduces relative position encoding is used to learn the deep semantic information of the text from the sequence of word vectors, and realize entity label prediction. Experimental results on public datasets and annotated patent datasets show that the method improves the accuracy of entity recognition.