IR DLDec 27, 2020

PatentMatch: A Dataset for Matching Patent Claims & Prior Art

Julian Risch, Nicolas Alder, Christoph Hewel, Ralf Krestel

arXiv:2012.13919v112.934 citationsHas Code

Originality Incremental advance

AI Analysis

This dataset addresses the time-consuming and complex information retrieval task of prior art search for patent examiners, aiming to enable computer-assisted search. It is an incremental contribution to the field of legal information retrieval.

This paper introduces PatentMatch, a dataset designed to train machine learning models for matching patent claims with prior art. It contains pairs of patent claims and corresponding text passages from cited patent documents, labeled by patent examiners for their degree of semantic correspondence. Preliminary experiments with a baseline system demonstrate its utility for training a binary text pair classifier.

Patent examiners need to solve a complex information retrieval task when they assess the novelty and inventive step of claims made in a patent application. Given a claim, they search for prior art, which comprises all relevant publicly available information. This time-consuming task requires a deep understanding of the respective technical domain and the patent-domain-specific language. For these reasons, we address the computer-assisted search for prior art by creating a training dataset for supervised machine learning called PatentMatch. It contains pairs of claims from patent applications and semantically corresponding text passages of different degrees from cited patent documents. Each pair has been labeled by technically-skilled patent examiners from the European Patent Office. Accordingly, the label indicates the degree of semantic correspondence (matching), i.e., whether the text passage is prejudicial to the novelty of the claimed invention or not. Preliminary experiments using a baseline system show that PatentMatch can indeed be used for training a binary text pair classifier on this challenging information retrieval task. The dataset is available online: https://hpi.de/naumann/s/patentmatch.

View on arXiv PDF Code

Similar