Unsupervised domain-agnostic identification of product names in social media posts
This addresses the need for scalable product name recognition across domains without retraining, benefiting customers, manufacturers, and online marketplaces.
The paper tackled the problem of identifying product names in unstructured social media text by developing a domain-agnostic, unsupervised algorithm based on Facebook posts, which uses a CRF model, part-of-speech tagging, patterns, and clustering with word embeddings to filter candidates.
Product name recognition is a significant practical problem, spurred by the greater availability of platforms for discussing products such as social media and product review functionalities of online marketplaces. Customers, product manufacturers and online marketplaces may want to identify product names in unstructured text to extract important insights, such as sentiment, surrounding a product. Much extant research on product name identification has been domain-specific (e.g., identifying mobile phone models) and used supervised or semi-supervised methods. With massive numbers of new products released to the market every year such methods may require retraining on updated labeled data to stay relevant, and may transfer poorly across domains. This research addresses this challenge and develops a domain-agnostic, unsupervised algorithm for identifying product names based on Facebook posts. The algorithm consists of two general steps: (a) candidate product name identification using an off-the-shelf pretrained conditional random fields (CRF) model, part-of-speech tagging and a set of simple patterns; and (b) filtering of candidate names to remove spurious entries using clustering and word embeddings generated from the data.