CVAIJan 25, 2025

PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures

arXiv:2501.15074v16 citationsh-index: 3AAAI
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of automating patent figure description generation for patent professionals, enabling more efficient knowledge sharing and drafting, but it is incremental as it adapts existing methods to a specific domain.

The authors tackled the automation of generating descriptions for patent figures by introducing PatentDesc-355K, a dataset of ~355K figures, and PatentLMM, a tailored multimodal model, which significantly boosts performance in generating coherent descriptions compared to off-the-shelf models.

Writing comprehensive and accurate descriptions of technical drawings in patent documents is crucial to effective knowledge sharing and enabling the replication and protection of intellectual property. However, automation of this task has been largely overlooked by the research community. To this end, we introduce PatentDesc-355K, a novel large-scale dataset containing ~355K patent figures along with their brief and detailed textual descriptions extracted from more than 60K US patent documents. In addition, we propose PatentLMM - a novel multimodal large language model specifically tailored to generate high-quality descriptions of patent figures. Our proposed PatentLMM comprises two key components: (i) PatentMME, a specialized multimodal vision encoder that captures the unique structural elements of patent figures, and (ii) PatentLLaMA, a domain-adapted version of LLaMA fine-tuned on a large collection of patents. Extensive experiments demonstrate that training a vision encoder specifically designed for patent figures significantly boosts the performance, generating coherent descriptions compared to fine-tuning similar-sized off-the-shelf multimodal models. PatentDesc-355K and PatentLMM pave the way for automating the understanding of patent figures, enabling efficient knowledge sharing and faster drafting of patent documents. We make the code and data publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes