Lynnette Hui Xian Ng

h-index42

27papers

445citations

Novelty36%

AI Score53

Ranked #30,994 of 201,326 authors (top 15%)#6,477 in CL (top 20%)

27 Papers

CLOct 19, 2022

How Hate Speech Varies by Target Identity: A Computational Analysis

Michael Miller Yoder, Lynnette Hui Xian Ng, David West Brown et al. · cmu

This paper investigates how hate speech varies in systematic ways according to the identities it targets. Across multiple hate speech datasets annotated for targeted identities, we find that classifiers trained on hate speech targeting specific identity groups struggle to generalize to other targeted identities. This provides empirical evidence for differences in hate speech by target identity; we then investigate which patterns structure this variation. We find that the targeted demographic category (e.g. gender/sexuality or race/ethnicity) appears to have a greater effect on the language of hate speech than does the relative social power of the targeted identity group. We also find that words associated with hate speech targeting specific identities often relate to stereotypes, histories of oppression, current social movements, and other social contexts specific to identities. These experiments suggest the importance of considering targeted identity, as well as the social contexts associated with these identities, in automated hate speech classification.

CLSep 24, 2023Code

Prompting and Fine-Tuning Open-Sourced Large Language Models for Stance Classification

Iain J. Cruickshank, Lynnette Hui Xian Ng

Stance classification, the task of predicting the viewpoint of an author on a subject of interest, has long been a focal point of research in domains ranging from social science to machine learning. Current stance detection methods rely predominantly on manual annotation of sentences, followed by training a supervised machine learning model. However, this manual annotation process requires laborious annotation effort, and thus hampers its potential to generalize across different contexts. In this work, we investigate the use of Large Language Models (LLMs) as a stance detection methodology that can reduce or even eliminate the need for manual annotations. We investigate 10 open-source models and 7 prompting schemes, finding that LLMs are competitive with in-domain supervised models but are not necessarily consistent in their performance. We also fine-tuned the LLMs, but discovered that fine-tuning process does not necessarily lead to better performance. In general, we discover that LLMs do not routinely outperform their smaller supervised machine learning models, and thus call for stance detection to be a benchmark for which LLMs also optimize for. The code used in this study is available at \url{https://github.com/ijcruic/LLM-Stance-Labeling}

61.6SIJun 1

The Structural Influence of Low-Credibility Narratives During the COVID-19 Vaccine Rollout

Lynnette Hui Xian Ng, Wenqi Zhou, Kathleen M. Carley

This work examines the structural influence of low-credibility narratives and the comparative role of automated accounts (bots) versus human users on social media platforms. To more accurately quantify the structural influence of a narrative on social media, this study proposes two novel metrics: (1) Appeal, which measures the network-weighted popularity of a message; and (2) Scope, which measures an author's message popularity-weighted network penetration. Applying these metrics, this study analyzes 5.8 million messages from X that contain low-credibility narratives regarding COVID-19 vaccine across three distinct temporal stages: Pre-Vaccine, Vaccine Launch, and Post-Launch. The results demonstrate that across all timeframes, human-distributed low-credibility narratives achieved higher structural influence compared to those generated by automated accounts. Furthermore, statistical analysis reveals a significant conditional temporal effect: human-driven low-credibility narratives attained their highest Appeal and Scope during the focal Vaccine Launch week, whereas automated accounts maximized their Appeal and Scope during the highly uncertain Pre-Vaccine period. These findings highlight the distinct operational capacities of automated and organic accounts, illustrating how the Appeal and Scope of low-credibility narratives is moderated by the lifecycle stages of critical public events.

54.5SIMay 5

Automated versus Human Engagement: Mapping Cognitive Bias Triggers in Online Discourse

Lynnette Hui Xian Ng, Wenqi Zhou, Kathleen M. Carley

In the digital environment, human attention is frequently guided by cognitive heuristics rather than deliberate evaluation. Since low-credibility narratives often lack substantive factual evidence, their diffusion disproportionally relies on activating these mental shortcut to simulate credibility and capture attention. This study presents a computational framework designed to detect computational triggers through observable data proxies for eight distinct cognitive biases across 3.5 million posts of contested COVID-19 narratives. We demonstrate that automated accounts (bots) embed these triggers more frequently than human users, yielding distinctly source-dependent associations with audience interaction. In bot-authored posts, affective and cognitive dissonance (stance-shifting) triggers are strongly associated with higher engagement, while the deployment of authority and availability (repetition) cues correlates with reduced audience interaction. Furthermore, we identify limits to heuristic compounding: positive engagement correlations with bot-authored content declines when multiple biases are stacked within a single post, whereas human-authored communication remains structurally resilient to high trigger density. By operationalizing psychological heuristics into scalable, measurable data, this work bridges computational social science and cognitive psychology to reveal how source identity (bot/human) shapes the mechanics of information diffusion in digital networks.

58.6CLMay 26

Stylistic Evolution and LLM Neutrality in Singlish Language

Linus Tze En Foo, Weihan Angela Ng, Wenkai Li et al.

Singlish is a creole rooted in Singapore's multilingual environment that continues to evolve alongside social and technological change. We examine diachronic stylistic change across a decade of informal digital messages and ask whether Large Language Models (LLMs) can generate temporally neutral outputs approximating the stable essence of the variety. Using lexical, pragmatic, psycholinguistic, and encoder-based features, we find that stylistic separability increases with temporal distance, driven primarily by structural features such as length and complexity. Evaluated against a null distribution baseline, most LLMs fail to achieve both authenticity and temporal neutrality simultaneously, revealing a structural trade-off: models generating realistic Singlish inherit its temporal biases, while temporally neutral models produce inauthentic outputs. These findings position temporal neutrality as a diagnostic metric for assessing sociolectal grounding in LLMs.

95.8AIApr 16

Anthropogenic Regional Adaptation in Multimodal Vision-Language Model

Samuel Cahyawijaya, Peerat Limkonchotiwat, Tack Hwa Wong et al.

While the field of vision-language (VL) has achieved remarkable success in integrating visual and textual information across multiple languages and domains, there is still no dedicated framework for assessing human-centric alignment in vision-language systems. We offer two contributions to address this gap. First, we introduce Anthropogenic Regional Adaptation: a novel paradigm that aims to optimize model relevance to specific regional contexts while ensuring the retention of global generalization capabilities. Second, we present a simple, but effective adaptation method named Geographical-generalization-made-easy (GG-EZ), which utilizes regional data filtering and model merging. Through comprehensive experiments on 3 VL architectures: large vision-language models, text-to-image diffusion models, and vision-language embedding models, and a case study in Southeast Asia (SEA) regional adaptation, we demonstrate the importance of Anthropogenic Regional Adaptation and the effectiveness of GG-EZ, showing 5-15% gains in cultural relevance metrics across SEA while maintaining over 98% of global performance and even occasionally surpassing it. Our findings establish Anthropogenic Regional Alignment as a foundational paradigm towards applicability of multimodal vision-language models in diverse regions and demonstrate a simple-yet-effective baseline method that optimizes regional value alignment while preserving global generalization.

CLJul 25, 2024

Examining the Influence of Political Bias on Large Language Model Performance in Stance Classification

Lynnette Hui Xian Ng, Iain Cruickshank, Roy Ka-Wei Lee

Large Language Models (LLMs) have demonstrated remarkable capabilities in executing tasks based on natural language queries. However, these models, trained on curated datasets, inherently embody biases ranging from racial to national and gender biases. It remains uncertain whether these biases impact the performance of LLMs for certain tasks. In this study, we investigate the political biases of LLMs within the stance classification task, specifically examining whether these models exhibit a tendency to more accurately classify politically-charged stances. Utilizing three datasets, seven LLMs, and four distinct prompting schemes, we analyze the performance of LLMs on politically oriented statements and targets. Our findings reveal a statistically significant difference in the performance of LLMs across various politically oriented stance classification tasks. Furthermore, we observe that this difference primarily manifests at the dataset level, with models and prompting schemes showing statistically similar performances across different stance classification datasets. Lastly, we observe that when there is greater ambiguity in the target the statement is directed towards, LLMs have poorer stance classification accuracy. Code & Dataset: http://doi.org/10.5281/zenodo.12938478

SIMar 6

The Architects of Narrative Evolution: Actor Interventions Across the SAGES Framework in Information Campaigns

Lynnette Hui Xian Ng, Yukai Zeng, Muthiah Ponmani

Narratives in digital spaces are not merely organic phenomena. They are strategically shaped by a range of actors to influence public perception, behavior, and sociopolitical outcomes. This paper offers an actor-oriented expansion of the SAGES Framework, a five-stage model that traces the evolution of narratives from digital inception to real-world impact: Seeding, Amplification, Galvanization, Expansion, and Stickiness. This framework maps how adversarial and constructive actors intervene at each stage to accelerate, redirect, or counter narrative trajectories. Through comparative case studies of the 2021 Myanmar military coup and the 2022 Russia-Ukraine war, we show how narrative manipulation campaigns unfold and how targeted interventions can mitigate their effects. The SAGES framework contributes a practical lens for analyzing influence operations and developing countermeasures in an era of contested information ecosystems.

SIOct 31, 2025

Simulating Misinformation Vulnerabilities With Agent Personas

David Farr, Lynnette Hui Xian Ng, Stephen Prochaska et al.

Disinformation campaigns can distort public perception and destabilize institutions. Understanding how different populations respond to information is crucial for designing effective interventions, yet real-world experimentation is impractical and ethically challenging. To address this, we develop an agent-based simulation using Large Language Models (LLMs) to model responses to misinformation. We construct agent personas spanning five professions and three mental schemas, and evaluate their reactions to news headlines. Our findings show that LLM-generated agents align closely with ground-truth labels and human predictions, supporting their use as proxies for studying information responses. We also find that mental schemas, more than professional background, influence how agents interpret misinformation. This work provides a validation of LLMs to be used as agents in an agent-based model of an information network for analyzing trust, polarization, and susceptibility to deceptive content in complex social systems.

CVMar 10, 2025Code

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Samuel Cahyawijaya, Holy Lovenia, Joel Ruben Antony Moniz et al. · cambridge

Southeast Asia (SEA) is a region of extraordinary linguistic and cultural diversity, yet it remains significantly underrepresented in vision-language (VL) research. This often results in artificial intelligence (AI) models that fail to capture SEA cultural nuances. To fill this gap, we present SEA-VL, an open-source initiative dedicated to developing high-quality, culturally relevant data for SEA languages. By involving contributors from SEA countries, SEA-VL aims to ensure better cultural relevance and diversity, fostering greater inclusivity of underrepresented languages in VL research. Beyond crowdsourcing, our initiative goes one step further in the exploration of the automatic collection of culturally relevant images through crawling and image generation. First, we find that image crawling achieves approximately ~85% cultural relevance while being more cost- and time-efficient than crowdsourcing. Second, despite the substantial progress in generative vision models, synthetic images remain unreliable in accurately reflecting SEA cultures. The generated images often fail to reflect the nuanced traditions and cultural contexts of the region. Collectively, we gather 1.28M SEA culturally-relevant images, more than 50 times larger than other existing datasets. Through SEA-VL, we aim to bridge the representation gap in SEA, fostering the development of more inclusive AI systems that authentically represent diverse cultures across SEA.

CLNov 8, 2024Code

What talking you?: Translating Code-Mixed Messaging Texts to English

Lynnette Hui Xian Ng, Luo Qi Chan

Translation of code-mixed texts to formal English allow a wider audience to understand these code-mixed languages, and facilitate downstream analysis applications such as sentiment analysis. In this work, we look at translating Singlish, which is colloquial Singaporean English, to formal standard English. Singlish is formed through the code-mixing of multiple Asian languages and dialects. We analysed the presence of other Asian languages and variants which can facilitate translation. Our dataset is short message texts, written as informal communication between Singlish speakers. We use a multi-step prompting scheme on five Large Language Models (LLMs) for language detection and translation. Our analysis show that LLMs do not perform well in this task, and we describe the challenges involved in translation of code-mixed languages. We also release our dataset in this link https://github.com/luoqichan/singlish.

CLSep 30, 2024

Disentangling Singlish Discourse Particles with Task-Driven Representation

Linus Tze En Foo, Lynnette Hui Xian Ng

Singlish, or formally Colloquial Singapore English, is an English-based creole language originating from the SouthEast Asian country Singapore. The language contains influences from Sinitic languages such as Chinese dialects, Malay, Tamil and so forth. A fundamental task to understanding Singlish is to first understand the pragmatic functions of its discourse particles, upon which Singlish relies heavily to convey meaning. This work offers a preliminary effort to disentangle the Singlish discourse particles (lah, meh and hor) with task-driven representation learning. After disentanglement, we cluster these discourse particles to differentiate their pragmatic functions, and perform Singlish-to-English machine translation. Our work provides a computational method to understanding Singlish discourse particles, and opens avenues towards a deeper comprehension of the language and its usage.

74.3MAMay 8

Social Theory Should Be a Structural Prior for Agentic AI: A Formal Framework for Multi-Agent Social Systems

Lynnette Hui Xian Ng, Iain J. Cruickshank, Adrian Xuan Wei Lim et al.

Agentic AI systems are increasingly deployed not in isolation, but inside social environments populated by other agents and humans, such as in social media platforms, multi-agent LLM pipelines or autonomous robotics fleets. In these settings, system behavior emerges not from individual agents alone, but from the multi-agent interactions over time. Emergent dynamics of individuals in a social group have been long studied by social scientists in human contexts. \textbf{This position paper argues that agentic AI systems must be modeled with social theory as a structural prior, and formalizes a Multi-Agent Social Systems (MASS) framework for how agents interact and influence to generate system-level outcomes.} We represent MASS as a class of dynamical system of information generation, local influence and interaction structure, formulated by four structural priors anchored in social theory: strategic heterogeneity, networked-constrained dependence, co-evolution and distributional instability. We demonstrate the importance of each structural prior through formal propositions, and articulate a research agenda for how MASS should be modeled, evaluated and governed.

86.2SIMar 18

Temporal Narrative Monitoring in Dynamic Information Environments

David Farr, Stephen Prochaska, Jack Moody et al.

Comprehending the information environment (IE) during crisis events is challenging due to the rapid change and abstract nature of the domain. Many approaches focus on snapshots via classification methods or network approaches to describe the IE in crisis, ignoring the temporal nature of how information changed over time. This work presents a system-oriented framework for modeling emerging narratives as temporally evolving semantic structures without requiring prior label specification. By integrating semantic embeddings, density-based clustering, and rolling temporal linkage, the framework represents narratives as persistent yet adaptive entities within a shared semantic space. We apply the methodology to a real-world crisis event and evaluate system behavior through stratified cluster validation and temporal lifecycle analysis. Results demonstrate high cluster coherence and reveal heterogeneous narrative lifecycles characterized by both transient fragments and stable narrative anchors. We ground our approach in situational awareness theory, supporting perception and comprehension of the IE by transforming unstructured social media streams into interpretable, temporally structured representations. The resulting system provides a methodology for monitoring and decision support in dynamic information environments.

CVJan 10, 2024

Reverse Projection: Real-Time Local Space Texture Mapping

Adrian Xuan Wei Lim, Lynnette Hui Xian Ng, Conor Griffin et al.

We present Reverse Projection, a novel projective texture mapping technique for painting a decal directly to the texture of a 3D object. Designed to be used in games, this technique works in real-time. By using projection techniques that are computed in local space textures and outward-looking, users using low-end android devices to high-end gaming desktops are able to enjoy the personalization of their assets. We believe our proposed pipeline is a step in improving the speed and versatility of model painting.

CLAug 25, 2025

Humanizing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design

Yunze Xiao, Lynnette Hui Xian Ng, Jiarui Liu et al.

Large Language Models (LLMs) increasingly exhibit \textbf{anthropomorphism} characteristics -- human-like qualities portrayed across their outlook, language, behavior, and reasoning functions. Such characteristics enable more intuitive and engaging human-AI interactions. However, current research on anthropomorphism remains predominantly risk-focused, emphasizing over-trust and user deception while offering limited design guidance. We argue that anthropomorphism should instead be treated as a \emph{concept of design} that can be intentionally tuned to support user goals. Drawing from multiple disciplines, we propose that the anthropomorphism of an LLM-based artifact should reflect the interaction between artifact designers and interpreters. This interaction is facilitated by cues embedded in the artifact by the designers and the (cognitive) responses of the interpreters to the cues. Cues are categorized into four dimensions: \textit{perceptive, linguistic, behavioral}, and \textit{cognitive}. By analyzing the manifestation and effectiveness of each cue, we provide a unified taxonomy with actionable levers for practitioners. Consequently, we advocate for function-oriented evaluations of anthropomorphic design.

SIAug 1, 2025

Are LLM-Powered Social Media Bots Realistic?

Lynnette Hui Xian Ng, Kathleen M. Carley

As Large Language Models (LLMs) become more sophisticated, there is a possibility to harness LLMs to power social media bots. This work investigates the realism of generating LLM-Powered social media bot networks. Through a combination of manual effort, network science and LLMs, we create synthetic bot agent personas, their tweets and their interactions, thereby simulating social media networks. We compare the generated networks against empirical bot/human data, observing that both network and linguistic properties of LLM-Powered Bots differ from Wild Bots/Humans. This has implications towards the detection and effectiveness of LLM-Powered Bots.

LGMar 26, 2025

Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models

Lynnette Hui Xian Ng, Kokil Jaidka, Kaiyuan Tay et al.

Supervised machine-learning models often underperform in predicting user behaviors from conversational text, hindered by poor crowdsourced label quality and low NLP task accuracy. We introduce the Metadata-Sensitive Weighted-Encoding Ensemble Model (MSWEEM), which integrates annotator meta-features like fatigue and speeding. First, our results show MSWEEM outperforms standard ensembles by 14% on held-out data and 12% on an alternative dataset. Second, we find that incorporating signals of annotator behavior, such as speed and fatigue, significantly boosts model performance. Third, we find that annotators with higher qualifications, such as Master's, deliver more consistent and faster annotations. Given the increasing uncertainty over annotation quality, our experiments show that understanding annotator patterns is crucial for enhancing model accuracy in user behavior prediction.

CYJan 1, 2025

What is a Social Media Bot? A Global Comparison of Bot and Human Characteristics

Lynnette Hui Xian Ng, Kathleen M. Carley

Chatter on social media is 20% bots and 80% humans. Chatter by bots and humans is consistently different: bots tend to use linguistic cues that can be easily automated while humans use cues that require dialogue understanding. Bots use words that match the identities they choose to present, while humans may send messages that are not related to the identities they present. Bots and humans differ in their communication structure: sampled bots have a star interaction structure, while sampled humans have a hierarchical structure. These conclusions are based on a large-scale analysis of social media tweets across ~200mil users across 7 events. Social media bots took the world by storm when social-cybersecurity researchers realized that social media users not only consisted of humans but also of artificial agents called bots. These bots wreck havoc online by spreading disinformation and manipulating narratives. Most research on bots are based on special-purposed definitions, mostly predicated on the event studied. This article first begins by asking, "What is a bot?", and we study the underlying principles of how bots are different from humans. We develop a first-principle definition of a social media bot. With this definition as a premise, we systematically compare characteristics between bots and humans across global events, and reflect on how the software-programmed bot is an Artificial Intelligent algorithm, and its potential for evolution as technology advances. Based on our results, we provide recommendations for the use and regulation of bots. Finally, we discuss open challenges and future directions: Detect, to systematically identify these automated and potentially evolving bots; Differentiate, to evaluate the goodness of the bot in terms of their content postings and relationship interactions; Disrupt, to moderate the impact of malicious bots.

CLOct 27, 2024

Who Speaks Matters: Analysing the Influence of the Speaker's Ethnicity on Hate Classification

Ananya Malik, Kartik Sharma, Shaily Bhatt et al. · cmu

Large Language Models (LLMs) offer a lucrative promise for scalable content moderation, including hate speech detection. However, they are also known to be brittle and biased against marginalised communities and dialects. This requires their applications to high-stakes tasks like hate speech detection to be critically scrutinized. In this work, we investigate the robustness of hate speech classification using LLMs particularly when explicit and implicit markers of the speaker's ethnicity are injected into the input. For explicit markers, we inject a phrase that mentions the speaker's linguistic identity. For the implicit markers, we inject dialectal features. By analysing how frequently model outputs flip in the presence of these markers, we reveal varying degrees of brittleness across 3 LLMs and 1 LM and 5 linguistic identities. We find that the presence of implicit dialect markers in inputs causes model outputs to flip more than the presence of explicit markers. Further, the percentage of flips varies across ethnicities. Finally, we find that larger models are more robust. Our findings indicate the need for exercising caution in deploying LLMs for high-stakes tasks like hate speech detection.

CLMar 5, 2024

DIVERSE: A Dataset of YouTube Video Comment Stances with a Data Programming Model

Iain J. Cruickshank, Amir Soofi, Lynnette Hui Xian Ng

Public opinion of military organizations significantly influences their ability to recruit talented individuals. As recruitment efforts increasingly extend into digital spaces like social media, it becomes essential to assess the stance of social media users toward online military content. However, there is a notable lack of data for analyzing opinions on military recruiting efforts online, compounded by challenges in stance labeling, which is crucial for understanding public perceptions. Despite the importance of stance analysis for successful online military recruitment, creating human-annotated, in-domain stance labels is resource-intensive. In this paper, we address both the challenges of stance labeling and the scarcity of data on public opinions of online military recruitment by introducing and releasing the DIVERSE dataset: https://doi.org/10.5281/zenodo.10493803. This dataset comprises all comments from the U.S. Army's official YouTube Channel videos. We employed a state-of-the-art weak supervision approach, leveraging large language models to label the stance of each comment toward its respective video and the U.S. Army. Our findings indicate that the U.S. Army's videos began attracting a significant number of comments post-2021, with the stance distribution generally balanced among supportive, oppositional, and neutral comments, with a slight skew towards oppositional versus supportive comments.

CLOct 21, 2024

Limpeh ga li gong: Challenges in Singlish Annotations

Luo Qi Chan, Lynnette Hui Xian Ng

Singlish, or Colloquial Singapore English, is a language formed from oral and social communication within multicultural Singapore. In this work, we work on a fundamental Natural Language Processing (NLP) task: Parts-Of-Speech (POS) tagging of Singlish sentences. For our analysis, we build a parallel Singlish dataset containing direct English translations and POS tags, with translation and POS annotation done by native Singlish speakers. Our experiments show that automatic transition- and transformer- based taggers perform with only $\sim 80\%$ accuracy when evaluated against human-annotated POS labels, suggesting that there is indeed room for improvement on computation analysis of the language. We provide an exposition of challenges in Singlish annotation: its inconsistencies in form and semantics, the highly context-dependent particles of the language, its structural unique expressions, and the variation of the language on different mediums. Our task definition, resultant labels and results reflects the challenges in analysing colloquial languages formulated from a variety of dialects, and paves the way for future studies beyond POS tagging.

CLDec 31, 2021

Using Graph-Aware Reinforcement Learning to Identify Winning Strategies in Diplomacy Games (Student Abstract)

Hansin Ahuja, Lynnette Hui Xian Ng, Kokil Jaidka

This abstract proposes an approach towards goal-oriented modeling of the detection and modeling complex social phenomena in multiparty discourse in an online political strategy game. We developed a two-tier approach that first encodes sociolinguistic behavior as linguistic features then use reinforcement learning to estimate the advantage afforded to any player. In the first tier, sociolinguistic behavior, such as Friendship and Reasoning, that speakers use to influence others are encoded as linguistic features to identify the persuasive strategies applied by each player in simultaneous two-party dialogues. In the second tier, a reinforcement learning approach is used to estimate a graph-aware reward function to quantify the advantage afforded to each player based on their standing in this multiparty setup. We apply this technique to the game Diplomacy, using a dataset comprising of over 15,000 messages exchanged between 78 users. Our graph-aware approach shows robust performance compared to a context-agnostic setup.

LGDec 26, 2021

Will You Dance To The Challenge? Predicting User Participation of TikTok Challenges

Lynnette Hui Xian Ng, John Yeh Han Tan, Darryl Jing Heng Tan et al.

TikTok is a popular new social media, where users express themselves through short video clips. A common form of interaction on the platform is participating in "challenges", which are songs and dances for users to iterate upon. Challenge contagion can be measured through replication reach, i.e., users uploading videos of their participation in the challenges. The uniqueness of the TikTok platform where both challenge content and user preferences are evolving requires the combination of challenge and user representation. This paper investigates social contagion of TikTok challenges through predicting a user's participation. We propose a novel deep learning model, deepChallenger, to learn and combine latent user and challenge representations from past videos to perform this user-challenge prediction task. We collect a dataset of over 7,000 videos from 12 trending challenges on the ForYouPage, the app's landing page, and over 10,000 videos from 1303 users. Extensive experiments are conducted and the results show that our proposed deepChallenger (F1=0.494) outperforms baselines (F1=0.188) in the prediction task.

SISep 2, 2021

Coordinating Narratives and the Capitol Riots on Parler

Lynnette Hui Xian Ng, Iain Cruickshank, Kathleen M. Carley

Coordinated disinformation campaigns are used to influence social media users, potentially leading to offline violence. In this study, we introduce a general methodology to uncover coordinated messaging through analysis of user parleys on Parler. The proposed method constructs a user-to-user coordination network graph induced by a user-to-text graph and a text-to-text similarity graph. The text-to-text graph is constructed based on the textual similarity of Parler posts. We study three influential groups of users in the 6 January 2020 Capitol riots and detect networks of coordinated user clusters that are all posting similar textual content in support of different disinformation narratives related to the U.S. 2020 elections.

SIApr 2, 2021

The Coronavirus is a Bioweapon: Analysing Coronavirus Fact-Checked Stories

Lynnette Hui Xian Ng, Kathleen M. Carley

The 2020 coronavirus pandemic has heightened the need to flag coronavirus-related misinformation, and fact-checking groups have taken to verifying misinformation on the Internet. We explore stories reported by fact-checking groups PolitiFact, Poynter and Snopes from January to June 2020, characterising them into six story clusters before then analyse time-series and story validity trends and the level of agreement across sites. We further break down the story clusters into more granular story types by proposing a unique automated method with a BERT classifier, which can be used to classify diverse story sources, in both fact-checked stories and tweets.

IRDec 11, 2020

KOSMOS: Knowledge-graph Oriented Social media and Mainstream media Overview System

Chua Hao Yang, Yong Shan Jie, Boon Kok Chin et al.

We introduce KOSMOS, a knowledge retrieval system based on the constructed knowledge graph of social media and mainstream media documents. The system first identifies key events from the documents at each time frame through clustering, extracting a document to represent each cluster, then describing the document in terms of 5W1H (Who, What, When, Where, Why, How). The event centric knowledge graph is enhanced by relation triplets and entity disambiguation from the representative document. This knowledge retrieval is supported by a web interface that presents a graph visualisation of related nodes and relevant articles based on a user query. The interface facilitates understanding relationships between events reported in mainstream and social media journalism through the KOSMOS information extraction pipeline, which is valuable to understand media slant and public opinions. Finally, we explore a use case in extracting events and relations from documents to understand the media and community's view to the 2020 COVID19 pandemic.