CLAIApr 22, 2025

IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property

arXiv:2504.15524v22 citationsh-index: 34Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for better evaluation of LLMs in the specialized domain of intellectual property, which is important for legal and technical professionals, though it is incremental as it builds on existing benchmarking approaches.

The authors tackled the problem of evaluating large language models (LLMs) in intellectual property (IP) tasks by introducing IPBench, a comprehensive bilingual benchmark covering 8 IP mechanisms and 20 tasks, and found that even the top-performing model achieved only 75.8% accuracy, with open-source IP-specific models lagging behind closed-source general-purpose ones.

Intellectual Property (IP) is a highly specialized domain that integrates technical and legal knowledge, making it inherently complex and knowledge-intensive. Recent advancements in LLMs have demonstrated their potential to handle IP-related tasks, enabling more efficient analysis, understanding, and generation of IP-related content. However, existing datasets and benchmarks focus narrowly on patents or cover limited aspects of the IP field, lacking alignment with real-world scenarios. To bridge this gap, we introduce IPBench, the first comprehensive IP task taxonomy and a large-scale bilingual benchmark encompassing 8 IP mechanisms and 20 distinct tasks, designed to evaluate LLMs in real-world IP scenarios. We benchmark 17 main LLMs, ranging from general purpose to domain-specific, including chat-oriented and reasoning-focused models, under zero-shot, few-shot, and chain-of-thought settings. Our results show that even the top-performing model, DeepSeek-V3, achieves only 75.8% accuracy, indicating significant room for improvement. Notably, open-source IP and law-oriented models lag behind closed-source general-purpose models. To foster future research, we publicly release IPBench, and will expand it with additional tasks to better reflect real-world complexities and support model advancements in the IP domain. We provide the data and code in the supplementary URLs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes