CLAIFeb 26, 2024

MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property

arXiv:2402.16389v183 citationsh-index: 16Has CodeLREC
Originality Synthesis-oriented
AI Analysis

This work provides a new benchmark for evaluating LLMs in the intellectual property domain, which is incremental as it applies existing methods to a new data area.

The authors tackled the lack of evaluation benchmarks for large language models in the intellectual property domain by introducing MoZIP, a multilingual benchmark with three tasks, and found that their fine-tuned model MoZi outperformed several LLMs but still fell short of ChatGPT, with all models showing significant room for improvement as even ChatGPT did not reach passing levels.

Large language models (LLMs) have demonstrated impressive performance in various natural language processing (NLP) tasks. However, there is limited understanding of how well LLMs perform in specific domains (e.g, the intellectual property (IP) domain). In this paper, we contribute a new benchmark, the first Multilingual-oriented quiZ on Intellectual Property (MoZIP), for the evaluation of LLMs in the IP domain. The MoZIP benchmark includes three challenging tasks: IP multiple-choice quiz (IPQuiz), IP question answering (IPQA), and patent matching (PatentMatch). In addition, we also develop a new IP-oriented multilingual large language model (called MoZi), which is a BLOOMZ-based model that has been supervised fine-tuned with multilingual IP-related text data. We evaluate our proposed MoZi model and four well-known LLMs (i.e., BLOOMZ, BELLE, ChatGLM and ChatGPT) on the MoZIP benchmark. Experimental results demonstrate that MoZi outperforms BLOOMZ, BELLE and ChatGLM by a noticeable margin, while it had lower scores compared with ChatGPT. Notably, the performance of current LLMs on the MoZIP benchmark has much room for improvement, and even the most powerful ChatGPT does not reach the passing level. Our source code, data, and models are available at \url{https://github.com/AI-for-Science/MoZi}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes