CLMay 22, 2025

MPL: Multiple Programming Languages with Large Language Models for Information Extraction

Bo Li, Gexiang Fang, Wei Ye, Zhenghua Xu, Jinglei Zhang, Hao Cheng, Shikun Zhang

arXiv:2505.16107v18.33 citationsh-index: 26Has CodeACL

Originality Incremental advance

AI Analysis

This work addresses the limitation of relying solely on Python for code-style inputs in information extraction, offering a more versatile approach for researchers and practitioners in natural language processing.

The paper tackles the problem of information extraction by exploring the use of multiple programming languages (C++ and Java) beyond Python in supervised fine-tuning, resulting in improved performance across various datasets.

Recent research in information extraction (IE) focuses on utilizing code-style inputs to enhance structured output generation. The intuition behind this is that the programming languages (PLs) inherently exhibit greater structural organization than natural languages (NLs). This structural advantage makes PLs particularly suited for IE tasks. Nevertheless, existing research primarily focuses on Python for code-style simulation, overlooking the potential of other widely-used PLs (e.g., C++ and Java) during the supervised fine-tuning (SFT) phase. In this research, we propose \textbf{M}ultiple \textbf{P}rogramming \textbf{L}anguages with large language models for information extraction (abbreviated as \textbf{MPL}), a novel framework that explores the potential of incorporating different PLs in the SFT phase. Additionally, we introduce \texttt{function-prompt} with virtual running to simulate code-style inputs more effectively and efficiently. Experimental results on a wide range of datasets demonstrate the effectiveness of MPL. Furthermore, we conduct extensive experiments to provide a comprehensive analysis. We have released our code for future research.

View on arXiv PDF Code

Similar