Tamir Sheafer

CL
h-index30
6papers
661citations
Novelty41%
AI Score46

6 Papers

CLOct 6, 2022
Detecting Narrative Elements in Informational Text

Effi Levi, Guy Mor, Tamir Sheafer et al.

Automatic extraction of narrative elements from text, combining narrative theories with computational models, has been receiving increasing attention over the last few years. Previous works have utilized the oral narrative theory by Labov and Waletzky to identify various narrative elements in personal stories texts. Instead, we direct our focus to informational texts, specifically news stories. We introduce NEAT (Narrative Elements AnnoTation) - a novel NLP task for detecting narrative elements in raw text. For this purpose, we designed a new multi-label narrative annotation scheme, better suited for informational text (e.g. news media), by adapting elements from the narrative theory of Labov and Waletzky (Complication and Resolution) and adding a new narrative element of our own (Success). We then used this scheme to annotate a new dataset of 2,209 sentences, compiled from 46 news articles from various category domains. We trained a number of supervised models in several different setups over the annotated dataset to identify the different narrative elements, achieving an average F1 score of up to 0.77. The results demonstrate the holistic nature of our annotation scheme as well as its robustness to domain category.

CLApr 14, 2024
Reap the Wild Wind: Detecting Media Storms in Large-Scale News Corpora

Dror K. Markus, Effi Levi, Tamir Sheafer et al.

Media Storms, dramatic outbursts of attention to a story, are central components of media dynamics and the attention landscape. Despite their significance, there has been little systematic and empirical research on this concept due to issues of measurement and operationalization. We introduce an iterative human-in-the-loop method to identify media storms in a large-scale corpus of news articles. The text is first transformed into signals of dispersion based on several textual characteristics. In each iteration, we apply unsupervised anomaly detection to these signals; each anomaly is then validated by an expert to confirm the presence of a storm, and those results are then used to tune the anomaly detection in the next iteration. We demonstrate the applicability of this method in two scenarios: first, supplementing an initial list of media storms within a specific time frame; and second, detecting media storms in new time periods. We make available a media storm dataset compiled using both scenarios. Both the method and dataset offer the basis for comprehensive empirical research into the concept of media storms, including characterizing them and predicting their outbursts and durations, in mainstream media or social media platforms.

CLOct 12, 2025
You're Not Gonna Believe This: A Computational Analysis of Factual Appeals and Sourcing in Partisan News

Guy Mor-Lan, Tamir Sheafer, Shaul R. Shenhav

While media bias is widely studied, the epistemic strategies behind factual reporting remain computationally underexplored. This paper analyzes these strategies through a large-scale comparison of CNN and Fox News. To isolate reporting style from topic selection, we employ an article matching strategy to compare reports on the same events and apply the FactAppeal framework to a corpus of over 470K articles covering two highly politicized periods: the COVID-19 pandemic and the Israel-Hamas war. We find that CNN's reporting contains more factual statements and is more likely to ground them in external sources. The outlets also exhibit sharply divergent sourcing patterns: CNN builds credibility by citing Experts} and Expert Documents, constructing an appeal to formal authority, whereas Fox News favors News Reports and direct quotations. This work quantifies how partisan outlets use systematically different epistemic strategies to construct reality, adding a new dimension to the study of media bias.

CLOct 12, 2025
FactAppeal: Identifying Epistemic Factual Appeals in News Media

Guy Mor-Lan, Tamir Sheafer, Shaul R. Shenhav

How is a factual claim made credible? We propose the novel task of Epistemic Appeal Identification, which identifies whether and how factual statements have been anchored by external sources or evidence. To advance research on this task, we present FactAppeal, a manually annotated dataset of 3,226 English-language news sentences. Unlike prior resources that focus solely on claim detection and verification, FactAppeal identifies the nuanced epistemic structures and evidentiary basis underlying these claims and used to support them. FactAppeal contains span-level annotations which identify factual statements and mentions of sources on which they rely. Moreover, the annotations include fine-grained characteristics of factual appeals such as the type of source (e.g. Active Participant, Witness, Expert, Direct Evidence), whether it is mentioned by name, mentions of the source's role and epistemic credentials, attribution to the source via direct or indirect quotation, and other features. We model the task with a range of encoder models and generative decoder models in the 2B-9B parameter range. Our best performing model, based on Gemma 2 9B, achieves a macro-F1 score of 0.73.

CLAug 21, 2025
HebID: Detecting Social Identities in Hebrew-language Political Text

Guy Mor-Lan, Naama Rivlin-Angert, Yael R. Kaplan et al.

Political language is deeply intertwined with social identities. While social identities are often shaped by specific cultural contexts and expressed through particular uses of language, existing datasets for group and identity detection are predominantly English-centric, single-label and focus on coarse identity categories. We introduce HebID, the first multilabel Hebrew corpus for social identity detection: 5,536 sentences from Israeli politicians' Facebook posts (Dec 2018-Apr 2021), manually annotated for twelve nuanced social identities (e.g. Rightist, Ultra-Orthodox, Socially-oriented) grounded by survey data. We benchmark multilabel and single-label encoders alongside 2B-9B-parameter generative LLMs, finding that Hebrew-tuned LLMs provide the best results (macro-$F_1$ = 0.74). We apply our classifier to politicians' Facebook posts and parliamentary speeches, evaluating differences in popularity, temporal trends, clustering patterns, and gender-related variations in identity expression. We utilize identity choices from a national public survey, enabling a comparison between identities portrayed in elite discourse and the public's identity priorities. HebID provides a comprehensive foundation for studying social identities in Hebrew and can serve as a model for similar research in other non-English political contexts.

CLJul 9, 2020
CompRes: A Dataset for Narrative Structure in News

Effi Levi, Guy Mor, Shaul Shenhav et al.

This paper addresses the task of automatically detecting narrative structures in raw texts. Previous works have utilized the oral narrative theory by Labov and Waletzky to identify various narrative elements in personal stories texts. Instead, we direct our focus to news articles, motivated by their growing social impact as well as their role in creating and shaping public opinion. We introduce CompRes -- the first dataset for narrative structure in news media. We describe the process in which the dataset was constructed: first, we designed a new narrative annotation scheme, better suited for news media, by adapting elements from the narrative theory of Labov and Waletzky (Complication and Resolution) and adding a new narrative element of our own (Success); then, we used that scheme to annotate a set of 29 English news articles (containing 1,099 sentences) collected from news and partisan websites. We use the annotated dataset to train several supervised models to identify the different narrative elements, achieving an $F_1$ score of up to 0.7. We conclude by suggesting several promising directions for future work.