A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products
This work addresses a gap in resources for supply chain monitoring and market research, but it is incremental as it focuses on creating guidelines and a dataset rather than novel methods.
The authors tackled the lack of annotated corpora for recognizing business products and their relations in texts by conducting a corpus study and developing an annotation schema, resulting in a preliminary annotated corpus of English web and social media documents.
Recognizing non-standard entity types and relations, such as B2B products, product classes and their producers, in news and forum texts is important in application areas such as supply chain monitoring and market research. However, there is a decided lack of annotated corpora and annotation guidelines in this domain. In this work, we present a corpus study, an annotation schema and associated guidelines, for the annotation of product entity and company-product relation mentions. We find that although product mentions are often realized as noun phrases, defining their exact extent is difficult due to high boundary ambiguity and the broad syntactic and semantic variety of their surface realizations. We also describe our ongoing annotation effort, and present a preliminary corpus of English web and social media documents annotated according to the proposed guidelines.