CLNov 11, 2025

From Semantic Roles to Opinion Roles: SRL Data Extraction for Multi-Task and Transfer Learning in Low-Resource ORL

arXiv:2511.08537v1h-index: 6
Originality Synthesis-oriented
AI Analysis

This provides a reusable resource for researchers to enhance ORL in low-resource opinion mining scenarios, but it is incremental as it adapts existing SRL data.

The paper tackled the problem of constructing a high-quality Semantic Role Labeling (SRL) dataset from the WSJ portion of OntoNotes 5.0 and adapting it for Opinion Role Labeling (ORL) tasks, resulting in a dataset of 97,169 predicate-argument instances mapped to ORL roles.

This report presents a detailed methodology for constructing a high-quality Semantic Role Labeling (SRL) dataset from the Wall Street Journal (WSJ) portion of the OntoNotes 5.0 corpus and adapting it for Opinion Role Labeling (ORL) tasks. Leveraging the PropBank annotation framework, we implement a reproducible extraction pipeline that aligns predicate-argument structures with surface text, converts syntactic tree pointers to coherent spans, and applies rigorous cleaning to ensure semantic fidelity. The resulting dataset comprises 97,169 predicate-argument instances with clearly defined Agent (ARG0), Predicate (REL), and Patient (ARG1) roles, mapped to ORL's Holder, Expression, and Target schema. We provide a detailed account of our extraction algorithms, discontinuous argument handling, annotation corrections, and statistical analysis of the resulting dataset. This work offers a reusable resource for researchers aiming to leverage SRL for enhancing ORL, especially in low-resource opinion mining scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes