Query Generation for Patent Retrieval with Keyword Extraction based on Syntactic Features
This is an incremental improvement for patent retrieval systems, specifically aiding legal professionals in prior art searches.
The paper tackled the problem of retrieving similar patents by extracting keywords from patent claims, and found that their method combining qualitative analysis with NLP parsing yields better search results than traditional tf-idf methods.
This paper describes a new method to extract relevant keywords from patent claims, as part of the task of retrieving other patents with similar claims (search for prior art). The method combines a qualitative analysis of the writing style of the claims with NLP methods to parse text, in order to represent a legal text as a specialization arborescence of terms. In this setting, the set of extracted keywords are yielding better search results than keywords extracted with traditional methods such as tf-idf. The performance is measured on the search results of a query consisting of the extracted keywords.