CLOct 14, 2021
Identifying Introductions in Podcast Episodes from Automatically Generated TranscriptsElise Jing, Kristiana Schneck, Dennis Egan et al.
As the volume of long-form spoken-word content such as podcasts explodes, many platforms desire to present short, meaningful, and logically coherent segments extracted from the full content. Such segments can be consumed by users to sample content before diving in, as well as used by the platform to promote and recommend content. However, little published work is focused on the segmentation of spoken-word content, where the errors (noise) in transcripts generated by automatic speech recognition (ASR) services poses many challenges. Here we build a novel dataset of complete transcriptions of over 400 podcast episodes, in which we label the position of introductions in each episode. These introductions contain information about the episodes' topics, hosts, and guests, providing a valuable summary of the episode content, as it is created by the authors. We further augment our dataset with word substitutions to increase the amount of available training data. We train three Transformer models based on the pre-trained BERT and different augmentation strategies, which achieve significantly better performance compared with a static embedding model, showing that it is possible to capture generalized, larger-scale structural information from noisy, loosely-organized speech data. This is further demonstrated through an analysis of the models' inner architecture. Our methods and dataset can be used to facilitate future work on the structure-based segmentation of spoken-word content.
CLMar 11, 2021
Characterizing Partisan Political Narrative Frameworks about COVID-19 on TwitterElise Jing, Yong-Yeol Ahn
The COVID-19 pandemic is a global crisis that has been testing every society and exposing the critical role of local politics in crisis response. In the United States, there has been a strong partisan divide between the Democratic and Republican party's narratives about the pandemic which resulted in polarization of individual behaviors and divergent policy adoption across regions. As shown in this case, as well as in most major social issues, strongly polarized narrative frameworks facilitate such narratives. To understand polarization and other social chasms, it is critical to dissect these diverging narratives. Here, taking the Democratic and Republican political social media posts about the pandemic as a case study, we demonstrate that a combination of computational methods can provide useful insights into the different contexts, framing, and characters and relationships that construct their narrative frameworks which individual posts source from. Leveraging a dataset of tweets from elite politicians in the U.S., we found that the Democrats' narrative tends to be more concerned with the pandemic as well as financial and social support, while the Republicans discuss more about other political entities such as China. We then perform an automatic framing analysis to characterize the ways in which they frame their narratives, where we found that the Democrats emphasize the government's role in responding to the pandemic, and the Republicans emphasize the roles of individuals and support for small businesses. Finally, we present a semantic role analysis that uncovers the important characters and relationships in their narratives as well as how they facilitate a membership categorization process. Our findings concretely expose the gaps in the "elusive consensus" between the two parties. Our methodologies may be applied to computationally study narratives in various domains.
CLFeb 20, 2020
FrameAxis: Characterizing Microframe Bias and Intensity with Word EmbeddingHaewoon Kwak, Jisun An, Elise Jing et al.
Framing is a process of emphasizing a certain aspect of an issue over the others, nudging readers or listeners towards different positions on the issue even without making a biased argument. {Here, we propose FrameAxis, a method for characterizing documents by identifying the most relevant semantic axes ("microframes") that are overrepresented in the text using word embedding. Our unsupervised approach can be readily applied to large datasets because it does not require manual annotations. It can also provide nuanced insights by considering a rich set of semantic axes. FrameAxis is designed to quantitatively tease out two important dimensions of how microframes are used in the text. \textit{Microframe bias} captures how biased the text is on a certain microframe, and \textit{microframe intensity} shows how actively a certain microframe is used. Together, they offer a detailed characterization of the text. We demonstrate that microframes with the highest bias and intensity well align with sentiment, topic, and partisan spectrum by applying FrameAxis to multiple datasets from restaurant reviews to political news.} The existing domain knowledge can be incorporated into FrameAxis {by using custom microframes and by using FrameAxis as an iterative exploratory analysis instrument.} Additionally, we propose methods for explaining the results of FrameAxis at the level of individual words and documents. Our method may accelerate scalable and sophisticated computational analyses of framing across disciplines.
CLApr 16, 2019
Sameness Entices, but Novelty Enchants in Fanfiction OnlineElise Jing, Simon DeDeo, Devin Robert Wright et al.
Cultural evolution is driven by how we choose what to consume and share with others. A common belief is that the cultural artifacts that succeed are ones that balance novelty and conventionality. This balance theory suggests that people prefer works that are familiar, but not so familiar as to be boring; novel, but not so novel as to violate the expectations of their genre. We test this idea using a large dataset of fanfiction. We apply a multiple regression model and a generalized additive model to examine how the recognition a work receives varies with its novelty, estimated through a Latent Dirichlet Allocation topic model, in the context of existing works. We find the opposite pattern of what the balance theory predicts$\unicode{x2014}$overall success decline almost monotonically with novelty and exhibits a U-shaped, instead of an inverse U-shaped, curve. This puzzle is resolved by teasing out two competing forces: sameness attracts the mass whereas novelty provides enjoyment. Taken together, even though the balance theory holds in terms of expressed enjoyment, the overall success can show the opposite pattern due to the dominant role of sameness to attract the audience. Under these two forces, cultural evolution may have to work against inertia$\unicode{x2014}$the appetite for consuming the familiar$\unicode{x2014}$and may resemble a punctuated equilibrium, marked by occasional leaps.