LGMay 27, 2022
FedFormer: Contextual Federation with Attention in Reinforcement LearningLiam Hebert, Lukasz Golab, Pascal Poupart et al.
A core issue in multi-agent federated reinforcement learning is defining how to aggregate insights from multiple agents. This is commonly done by taking the average of each participating agent's model weights into one common model (FedAvg). We instead propose FedFormer, a novel federation strategy that utilizes Transformer Attention to contextually aggregate embeddings from models originating from different learner agents. In so doing, we attentively weigh the contributions of other agents with respect to the current agent's environment and learned relationships, thus providing a more effective and efficient federation. We evaluate our methods on the Meta-World environment and find that our approach yields significant improvements over FedAvg and non-federated Soft Actor-Critic single-agent methods. Our results compared to Soft Actor-Critic show that FedFormer achieves higher episodic return while still abiding by the privacy constraints of federated learning. Finally, we also demonstrate improvements in effectiveness with increased agent pools across all methods in certain tasks. This is contrasted by FedAvg, which fails to make noticeable improvements when scaled.
CLJan 10, 2023
Predicting Hateful Discussions on Reddit using Graph Transformer Networks and Communal ContextLiam Hebert, Lukasz Golab, Robin Cohen
We propose a system to predict harmful discussions on social media platforms. Our solution uses contextual deep language models and proposes the novel idea of integrating state-of-the-art Graph Transformer Networks to analyze all conversations that follow an initial post. This framework also supports adapting to future comments as the conversation unfolds. In addition, we study whether a community-specific analysis of hate speech leads to more effective detection of hateful discussions. We evaluate our approach on 333,487 Reddit discussions from various communities. We find that community-specific modeling improves performance two-fold and that models which capture wider-discussion context improve accuracy by 28\% (35\% for the most hateful content) compared to limited context models.
LGJan 25, 2023
Qualitative Analysis of a Graph Transformer Approach to Addressing Hate Speech: Adapting to Dynamically Changing ContentLiam Hebert, Hong Yi Chen, Robin Cohen et al.
Our work advances an approach for predicting hate speech in social media, drawing out the critical need to consider the discussions that follow a post to successfully detect when hateful discourse may arise. Using graph transformer networks, coupled with modelling attention and BERT-level natural language processing, our approach can capture context and anticipate upcoming anti-social behaviour. In this paper, we offer a detailed qualitative analysis of this solution for hate speech detection in social networks, leading to insights into where the method has the most impressive outcomes in comparison with competitors and identifying scenarios where there are challenges to achieving ideal performance. Included is an exploration of the kinds of posts that permeate social media today, including the use of hateful images. This suggests avenues for extending our model to be more comprehensive. A key insight is that the focus on reasoning about the concept of context positions us well to be able to support multi-modal analysis of online posts. We conclude with a reflection on how the problem we are addressing relates especially well to the theme of dynamic change, a critical concern for all AI solutions for social impact. We also comment briefly on how mental health well-being can be advanced with our work, through curated content attuned to the extent of hate in posts.
CLJul 18, 2023
Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social MediaLiam Hebert, Gaurav Sahu, Yuxuan Guo et al.
We present the Multi-Modal Discussion Transformer (mDT), a novel methodfor detecting hate speech in online social networks such as Reddit discussions. In contrast to traditional comment-only methods, our approach to labelling a comment as hate speech involves a holistic analysis of text and images grounded in the discussion context. This is done by leveraging graph transformers to capture the contextual relationships in the discussion surrounding a comment and grounding the interwoven fusion layers that combine text and image embeddings instead of processing modalities separately. To evaluate our work, we present a new dataset, HatefulDiscussions, comprising complete multi-modal discussions from multiple online communities on Reddit. We compare the performance of our model to baselines that only process individual comments and conduct extensive ablation studies.
CLMar 18, 2022
GRS: Combining Generation and Revision in Unsupervised Sentence SimplificationMohammad Dehghan, Dhruv Kumar, Lukasz Golab
We propose GRS: an unsupervised approach to sentence simplification that combines text generation and text revision. We start with an iterative framework in which an input sentence is revised using explicit edit operations, and add paraphrasing as a new edit operation. This allows us to combine the advantages of generative and revision-based approaches: paraphrasing captures complex edit operations, and the use of explicit edit operations in an iterative manner provides controllability and interpretability. We demonstrate these advantages of GRS compared to existing methods on the Newsela and ASSET datasets.
87.4CLMay 11
RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM SystemsJoel Rorseth, Parke Godfrey, Lukasz Golab et al.
This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models (LLMs) in data-driven applications. We leverage novel pruning strategies to efficiently identify a minimal set of rules that subsume all others. We further demonstrate novel applications of these rules for LLM safety, specifically to test the resiliency of safety training and effectiveness of adversarial prompt injections.
70.4LGMay 8
Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic PreferencesAli Falahati, Mohammad Mohammadi Amiri, Kate Larson et al.
Recursive retraining of generative models poses a critical representation challenge: when synthetic outputs are curated based on a fixed reward signal, the model tends to collapse onto a narrow set of outputs that over-optimize that objective. Prior work suggests that such collapse is unavoidable without adding real data into the mix. We revisit this conclusion from an alignment perspective and show that collapse can be mitigated through curation based on multiple reward functions. We formalize the dynamics of recursive training under heterogeneous preferences and prove that, under certain conditions, the model converges to a stable distribution that allocates probability mass across competing high-reward regions. The limiting distribution preserves diversity and provably satisfies a weighted Nash bargaining solution, offering a formal interpretation of value aggregation in synthetic retraining loops.
CLMay 11, 2024
RAGE Against the Machine: Retrieval-Augmented LLM ExplanationsJoel Rorseth, Parke Godfrey, Lukasz Golab et al.
This paper demonstrates RAGE, an interactive tool for explaining Large Language Models (LLMs) augmented with retrieval capabilities; i.e., able to query external sources and pull relevant information into their input context. Our explanations are counterfactual in the sense that they identify parts of the input context that, when removed, change the answer to the question posed to the LLM. RAGE includes pruning methods to navigate the vast space of possible explanations, allowing users to view the provenance of the produced answers.
DBMay 21, 2024
Explaining Expert Search and Team Formation Systems with ExESKiarash Golzadeh, Lukasz Golab, Jaroslaw Szlichta
Expert search and team formation systems operate on collaboration networks, with nodes representing individuals, labeled with their skills, and edges denoting collaboration relationships. Given a keyword query corresponding to the desired skills, these systems identify experts that best match the query. However, state-of-the-art solutions to this problem lack transparency. To address this issue, we propose ExES, a tool designed to explain expert search and team formation systems using factual and counterfactual methods from the field of explainable artificial intelligence (XAI). ExES uses factual explanations to highlight important skills and collaborations, and counterfactual explanations to suggest new skills and collaborations to increase the likelihood of being identified as an expert. Towards a practical deployment as an interactive explanation tool, we present and experimentally evaluate a suite of pruning strategies to speed up the explanation search. In many cases, our pruning strategies make ExES an order of magnitude faster than exhaustive search, while still producing concise and actionable explanations.
LGNov 16, 2025
The Alignment Game: A Theory of Long-Horizon Alignment Through Recursive CurationAli Falahati, Mohammad Mohammadi Amiri, Kate Larson et al.
In self-consuming generative models that train on their own outputs, alignment with user preferences becomes a recursive rather than one-time process. We provide the first formal foundation for analyzing the long-term effects of such recursive retraining on alignment. Under a two-stage curation mechanism based on the Bradley-Terry (BT) model, we model alignment as an interaction between two factions: the Model Owner, who filters which outputs should be learned by the model, and the Public User, who determines which outputs are ultimately shared and retained through interactions with the model. Our analysis reveals three structural convergence regimes depending on the degree of preference alignment: consensus collapse, compromise on shared optima, and asymmetric refinement. We prove a fundamental impossibility theorem: no recursive BT-based curation mechanism can simultaneously preserve diversity, ensure symmetric influence, and eliminate dependence on initialization. Framing the process as dynamic social choice, we show that alignment is not a static goal but an evolving equilibrium, shaped both by power asymmetries and path dependence.
CLOct 26, 2025
Rule-Based Explanations for Retrieval-Augmented LLM SystemsJoel Rorseth, Parke Godfrey, Lukasz Golab et al.
If-then rules are widely used to explain machine learning models; e.g., "if employed = no, then loan application = rejected." We present the first proposal to apply rules to explain the emerging class of large language models (LLMs) with retrieval-augmented generation (RAG). Since RAG enables LLM systems to incorporate retrieved information sources at inference time, rules linking the presence or absence of sources can explain output provenance; e.g., "if a Times Higher Education ranking article is retrieved, then the LLM ranks Oxford first." To generate such rules, a brute force approach would probe the LLM with all source combinations and check if the presence or absence of any sources leads to the same output. We propose optimizations to speed up rule generation, inspired by Apriori-like pruning from frequent itemset mining but redefined within the scope of our novel problem. We conclude with qualitative and quantitative experiments demonstrating our solutions' value and efficiency.
SOC-PHMar 26, 2025
Four Things People Should Know About MigrainesMohammad S. Parsa, Lukasz Golab
Migraine literacy among the public is known to be low, and this lack of understanding has a negative impact on migraineurs' quality of life. To understand this impact, we use text mining methods to study migraine discussion on the Reddit social media platform. We summarize the findings in the form of "four things people should know about chronic migraines": it is a serious disease that affects people of all ages, it can be triggered by many different factors, it affects women more than men, and it can get worse in combination with the COVID-19 virus.
SIMay 25, 2021
Climate Action During COVID-19 Recovery and Beyond: A Twitter Text Mining StudyMohammad S. Parsa, Lukasz Golab, Srinivasan Keshav
The Coronavirus pandemic created a global crisis that prompted immediate large-scale action, including economic shutdowns and mobility restrictions. These actions have had devastating effects on the economy, but some positive effects on the environment. As the world recovers from the pandemic, we ask the following question: What is the public attitude towards climate action during COVID-19 recovery and beyond? We answer this question by analyzing discussions on the Twitter social media platform. We find that most discussions support climate action and point out lessons learned during pandemic response that can shape future climate policy, although skeptics continue to have a presence. Additionally, concerns arise in the context of climate action during the pandemic, such as mitigating the risk of COVID-19 transmission on public transit.
CLJun 17, 2020
Iterative Edit-Based Unsupervised Sentence SimplificationDhruv Kumar, Lili Mou, Lukasz Golab et al.
We present a novel iterative, edit-based approach to unsupervised sentence simplification. Our model is guided by a scoring function involving fluency, simplicity, and meaning preservation. Then, we iteratively perform word and phrase-level edits on the complex sentence. Compared with previous approaches, our model does not require a parallel training set, but is more controllable and interpretable. Experiments on Newsela and WikiLarge datasets show that our approach is nearly as effective as state-of-the-art supervised approaches.