CLApr 3

LLM-based Atomic Propositions help weak extractors: Evaluation of a Propositioner for triplet extraction

Luc Pommeret, Thomas Gerald, Patrick Paroubek, Sahar Ghannay, Christophe Servan, Sophie Rosset

arXiv:2604.0286690.7h-index: 20

Predicted impact top 29% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the challenge of improving triplet extraction from natural language for knowledge graph construction, offering an incremental enhancement through interpretable intermediate structures.

The paper tackled the problem of extracting structured triplets from complex sentences for Knowledge Graph construction by decomposing text into atomic propositions, finding that this approach improved relation recall and overall accuracy for weaker extractors like GLiREL and CoreNLP, with a fallback strategy mitigating entity recall losses for stronger LLMs.

Knowledge Graph construction from natural language requires extracting structured triplets from complex, information-dense sentences. In this paper, we investigate if the decomposition of text into atomic propositions (minimal, semantically autonomous units of information) can improve the triplet extraction. We introduce MPropositionneur-V2, a small multilingual model covering six European languages trained by knowledge distillation from Qwen3-32B into a Qwen3-0.6B architecture, and we evaluate its integration into two extraction paradigms: entity-centric (GLiREL) and generative (Qwen3). Experiments on SMiLER, FewRel, DocRED and CaRB show that atomic propositions benefit weaker extractors (GLiREL, CoreNLP, 0.6B models), improving relation recall and, in the multilingual setting, overall accuracy. For stronger LLMs, a fallback combination strategy recovers entity recall losses while preserving the gains in relation extraction. These results show that atomic propositions are an interpretable intermediate data structure that complements extractors without replacing them.

View on arXiv PDF

Similar