Shaul R. Shenhav

CL
h-index30
8papers
878citations
Novelty40%
AI Score48

8 Papers

CLOct 6, 2022
Detecting Narrative Elements in Informational Text

Effi Levi, Guy Mor, Tamir Sheafer et al.

Automatic extraction of narrative elements from text, combining narrative theories with computational models, has been receiving increasing attention over the last few years. Previous works have utilized the oral narrative theory by Labov and Waletzky to identify various narrative elements in personal stories texts. Instead, we direct our focus to informational texts, specifically news stories. We introduce NEAT (Narrative Elements AnnoTation) - a novel NLP task for detecting narrative elements in raw text. For this purpose, we designed a new multi-label narrative annotation scheme, better suited for informational text (e.g. news media), by adapting elements from the narrative theory of Labov and Waletzky (Complication and Resolution) and adding a new narrative element of our own (Success). We then used this scheme to annotate a new dataset of 2,209 sentences, compiled from 46 news articles from various category domains. We trained a number of supervised models in several different setups over the annotated dataset to identify the different narrative elements, achieving an average F1 score of up to 0.77. The results demonstrate the holistic nature of our annotation scheme as well as its robustness to domain category.

CLNov 7, 2023
Factoring Hate Speech: A New Annotation Framework to Study Hate Speech in Social Media

Gal Ron, Effi Levi, Odelia Oshri et al.

In this work we propose a novel annotation scheme which factors hate speech into five separate discursive categories. To evaluate our scheme, we construct a corpus of over 2.9M Twitter posts containing hateful expressions directed at Jews, and annotate a sample dataset of 1,050 tweets. We present a statistical analysis of the annotated dataset as well as discuss annotation examples, and conclude by discussing promising directions for future work.

CLJun 11, 2022
A Decomposition-Based Approach for Evaluating and Analyzing Inter-Annotator Disagreement

Effi Levi, Shaul R. Shenhav

We propose a novel method to conceptually decompose an existing annotation into separate levels, allowing the analysis of inter-annotators disagreement in each level separately. We suggest two distinct strategies in order to actualize this approach: a theoretically-driven one, in which the researcher defines a decomposition based on prior knowledge of the annotation task, and an exploration-based one, in which many possible decompositions are inductively computed and presented to the researcher for interpretation and evaluation. Utilizing a recently constructed dataset for narrative analysis as our use-case, we apply each of the two strategies to demonstrate the potential of our approach in testing hypotheses regarding the sources of annotation disagreements, as well as revealing latent structures and relations within the annotation task. We conclude by suggesting how to extend and generalize our approach, as well as use it for other purposes.

CLApr 14, 2024
Reap the Wild Wind: Detecting Media Storms in Large-Scale News Corpora

Dror K. Markus, Effi Levi, Tamir Sheafer et al.

Media Storms, dramatic outbursts of attention to a story, are central components of media dynamics and the attention landscape. Despite their significance, there has been little systematic and empirical research on this concept due to issues of measurement and operationalization. We introduce an iterative human-in-the-loop method to identify media storms in a large-scale corpus of news articles. The text is first transformed into signals of dispersion based on several textual characteristics. In each iteration, we apply unsupervised anomaly detection to these signals; each anomaly is then validated by an expert to confirm the presence of a storm, and those results are then used to tune the anomaly detection in the next iteration. We demonstrate the applicability of this method in two scenarios: first, supplementing an initial list of media storms within a specific time frame; and second, detecting media storms in new time periods. We make available a media storm dataset compiled using both scenarios. Both the method and dataset offer the basis for comprehensive empirical research into the concept of media storms, including characterizing them and predicting their outbursts and durations, in mainstream media or social media platforms.

CLOct 12, 2025
You're Not Gonna Believe This: A Computational Analysis of Factual Appeals and Sourcing in Partisan News

Guy Mor-Lan, Tamir Sheafer, Shaul R. Shenhav

While media bias is widely studied, the epistemic strategies behind factual reporting remain computationally underexplored. This paper analyzes these strategies through a large-scale comparison of CNN and Fox News. To isolate reporting style from topic selection, we employ an article matching strategy to compare reports on the same events and apply the FactAppeal framework to a corpus of over 470K articles covering two highly politicized periods: the COVID-19 pandemic and the Israel-Hamas war. We find that CNN's reporting contains more factual statements and is more likely to ground them in external sources. The outlets also exhibit sharply divergent sourcing patterns: CNN builds credibility by citing Experts} and Expert Documents, constructing an appeal to formal authority, whereas Fox News favors News Reports and direct quotations. This work quantifies how partisan outlets use systematically different epistemic strategies to construct reality, adding a new dimension to the study of media bias.

CLOct 12, 2025
FactAppeal: Identifying Epistemic Factual Appeals in News Media

Guy Mor-Lan, Tamir Sheafer, Shaul R. Shenhav

How is a factual claim made credible? We propose the novel task of Epistemic Appeal Identification, which identifies whether and how factual statements have been anchored by external sources or evidence. To advance research on this task, we present FactAppeal, a manually annotated dataset of 3,226 English-language news sentences. Unlike prior resources that focus solely on claim detection and verification, FactAppeal identifies the nuanced epistemic structures and evidentiary basis underlying these claims and used to support them. FactAppeal contains span-level annotations which identify factual statements and mentions of sources on which they rely. Moreover, the annotations include fine-grained characteristics of factual appeals such as the type of source (e.g. Active Participant, Witness, Expert, Direct Evidence), whether it is mentioned by name, mentions of the source's role and epistemic credentials, attribution to the source via direct or indirect quotation, and other features. We model the task with a range of encoder models and generative decoder models in the 2B-9B parameter range. Our best performing model, based on Gemma 2 9B, achieves a macro-F1 score of 0.73.

CLAug 21, 2025
HebID: Detecting Social Identities in Hebrew-language Political Text

Guy Mor-Lan, Naama Rivlin-Angert, Yael R. Kaplan et al.

Political language is deeply intertwined with social identities. While social identities are often shaped by specific cultural contexts and expressed through particular uses of language, existing datasets for group and identity detection are predominantly English-centric, single-label and focus on coarse identity categories. We introduce HebID, the first multilabel Hebrew corpus for social identity detection: 5,536 sentences from Israeli politicians' Facebook posts (Dec 2018-Apr 2021), manually annotated for twelve nuanced social identities (e.g. Rightist, Ultra-Orthodox, Socially-oriented) grounded by survey data. We benchmark multilabel and single-label encoders alongside 2B-9B-parameter generative LLMs, finding that Hebrew-tuned LLMs provide the best results (macro-$F_1$ = 0.74). We apply our classifier to politicians' Facebook posts and parliamentary speeches, evaluating differences in popularity, temporal trends, clustering patterns, and gender-related variations in identity expression. We utilize identity choices from a national public survey, enabling a comparison between identities portrayed in elite discourse and the public's identity priorities. HebID provides a comprehensive foundation for studying social identities in Hebrew and can serve as a model for similar research in other non-English political contexts.

CLJul 28, 2025
Dialogues of Dissent: Thematic and Rhetorical Dimensions of Hate and Counter-Hate Speech in Social Media Conversations

Effi Levi, Gal Ron, Odelia Oshri et al.

We introduce a novel multi-labeled scheme for joint annotation of hate and counter-hate speech in social media conversations, categorizing hate and counter-hate messages into thematic and rhetorical dimensions. The thematic categories outline different discursive aspects of each type of speech, while the rhetorical dimension captures how hate and counter messages are communicated, drawing on Aristotle's Logos, Ethos and Pathos. We annotate a sample of 92 conversations, consisting of 720 tweets, and conduct statistical analyses, incorporating public metrics, to explore patterns of interaction between the thematic and rhetorical dimensions within and between hate and counter-hate speech. Our findings provide insights into the spread of hate messages on social media, the strategies used to counter them, and their potential impact on online behavior.