CLJul 9, 2025

Large Language Model for Extracting Complex Contract Information in Industrial Scenes

arXiv:2507.06539v2h-index: 3
Originality Incremental advance
AI Analysis

This provides an incremental solution for industrial contract processing, improving efficiency in legal and business domains.

The paper tackles complex contract information extraction in industrial scenarios by constructing a high-quality dataset using GPT-4 and GPT-3.5 for annotations and data augmentation, then fine-tuning a large language model, achieving excellent overall performance with high recall and precision.

This paper proposes a high-quality dataset construction method for complex contract information extraction tasks in industrial scenarios and fine-tunes a large language model based on this dataset. Firstly, cluster analysis is performed on industrial contract texts, and GPT-4 and GPT-3.5 are used to extract key information from the original contract data, obtaining high-quality data annotations. Secondly, data augmentation is achieved by constructing new texts, and GPT-3.5 generates unstructured contract texts from randomly combined keywords, improving model robustness. Finally, the large language model is fine-tuned based on the high-quality dataset. Experimental results show that the model achieves excellent overall performance while ensuring high field recall and precision and considering parsing efficiency. LoRA, data balancing, and data augmentation effectively enhance model accuracy and robustness. The proposed method provides a novel and efficient solution for industrial contract information extraction tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes