IRAICLLGFeb 24

OpenSanctions Pairs: Large-Scale Entity Matching with LLMs

arXiv:2603.11051h-index: 4Has Code
Originality Incremental advance
AI Analysis

This work addresses entity matching for compliance workflows, showing incremental improvements with LLMs but noting performance is nearing a practical ceiling.

The paper tackles the problem of entity matching in international sanctions data by benchmarking LLMs against a rule-based system, finding that off-the-shelf LLMs achieve up to 98.95% F1, significantly outperforming the baseline at 91.33% F1.

We release OpenSanctions Pairs, a large-scale entity matching benchmark derived from real-world international sanctions aggregation and analyst deduplication. The dataset contains 755,540 labeled pairs spanning 293 heterogeneous sources across 31 countries, with multilingual and cross-script names, noisy and missing attributes, and set-valued fields typical of compliance workflows. We benchmark a production rule-based matcher (nomenklatura RegressionV1 algorithm) against open- and closed-source LLMs in zero- and few-shot settings. Off-the-shelf LLMs substantially outperform the production rule-based baseline (91.33\% F1), reaching up to 98.95\% F1 (GPT-4o) and 98.23\% F1 with a locally deployable open model (DeepSeek-R1-Distill-Qwen-14B). DSPy MIPROv2 prompt optimization yields consistent but modest gains, while adding in-context examples provides little additional benefit and can degrade performance. Error analysis shows complementary failure modes: the rule-based system over-matches (high false positives), whereas LLMs primarily fail on cross-script transliteration and minor identifier/date inconsistencies. These results indicate that pairwise matching performance is approaching a practical ceiling in this setting, and motivate shifting effort toward pipeline components such as blocking, clustering, and uncertainty-aware review. Code available at https://github.com/chansmi/OSINT_entity_resolution

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes