Rasika Muralidharan

CL
h-index15
4papers
Novelty34%
AI Score41

4 Papers

SIJun 3
Federating Governance: How Community Rules Scale with Mastodon Instances

Rasika Muralidharan, Yong-Yeol Ahn, Bao Tran Truong

The rise of decentralized social media platforms like Mastodon and Bluesky highlights the challenge of scaling self-governance and moderation. As communities grow, they face new issues that demand increasingly complex governance structures. However, as moderation is mainly volunteer-driven, there is limited formal guidance on how community rules and moderation practices should evolve with growth. This study investigates how moderation scale with Mastodon instances by analyzing community rules across servers of varying sizes. We categorize these rules to identify key governance priorities and find that these priorities are remarkably consistent across instance sizes: rules addressing problematic content, such as harassment, hate speech, and illegal content, dominate regardless of scale. While smaller communities focus on narrower sets of topics, larger servers maintain a more balanced coverage of a broad range of topics. Our analysis of rule formalization reveals that community size strongly predicts rule development. As instances grow, their rules become more extensive and topically diverse, but also exhibit lower readability and linguistic diversity. In contrast, external federation interactions have a limited role, mainly associated with a broader scope of rules without substantially affecting their diversity or form. These findings highlight the relative influence of internal versus external factors, suggesting that local scaling pressures outweigh network-level dynamics in decentralized social media governance. The scaling pattern observed on Mastodon resemble those previously identified on centralized platforms such as Reddit, suggesting that community size imposes fundamental constraints on self-governance that transcend platform architectures

AIJan 16
XChoice: Explainable Evaluation of AI-Human Alignment in LLM-based Constrained Choice Decision Making

Weihong Qi, Fan Huang, Rasika Muralidharan et al.

We present XChoice, an explainable framework for evaluating AI-human alignment in constrained decision making. Moving beyond outcome agreement such as accuracy and F1 score, XChoice fits a mechanism-based decision model to human data and LLM-generated decisions, recovering interpretable parameters that capture the relative importance of decision factors, constraint sensitivity, and implied trade-offs. Alignment is assessed by comparing these parameter vectors across models, options, and subgroups. We demonstrate XChoice on Americans' daily time allocation using the American Time Use Survey (ATUS) as human ground truth, revealing heterogeneous alignment across models and activities and salient misalignment concentrated in Black and married groups. We further validate robustness of XChoice via an invariance analysis and evaluate targeted mitigation with a retrieval augmented generation (RAG) intervention. Overall, XChoice provides mechanism-based metrics that diagnose misalignment and support informed improvements beyond surface outcome matching.

CLMay 16
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

Zoher Kachwala, Bao Tran Truong, Rasika Muralidharan et al.

Social media are shifting towards pluralism -- community-governed platforms where groups define their own norms. What violates rules in one community may be perfectly acceptable in another. Can AI models help moderate such pluralistic communities? We formalize the task as a multiple-choice problem, mirroring how human moderators operate in the real world: given a comment and its surrounding context, identify which specific rule, if any, is violated. We introduce PluRule, a multimodal, multilingual benchmark for detecting 13,371 rule violations across 1,989 Reddit communities spanning 2,885 rules in 9 languages. Using this benchmark, we show that state-of-the-art vision-language models struggle significantly: even GPT-5.2 with high reasoning performs only slightly better than a trivial baseline. We also find that bigger models and increased context provide marginal gains, and universal rules like civility and self-promotion are easier to detect. Our results show that moderation of pluralistic communities on social media is a fundamental challenge for language models. Our code and benchmark are publicly available.

CLOct 8, 2025
Can Lessons From Human Teams Be Applied to Multi-Agent Systems? The Role of Structure, Diversity, and Interaction Dynamics

Rasika Muralidharan, Haewoon Kwak, Jisun An

Multi-Agent Systems (MAS) with Large Language Model (LLM)-powered agents are gaining attention, yet fewer studies explore their team dynamics. Inspired by human team science, we propose a multi-agent framework to examine core aspects of team science: structure, diversity, and interaction dynamics. We evaluate team performance across four tasks: CommonsenseQA, StrategyQA, Social IQa, and Latent Implicit Hate, spanning commonsense and social reasoning. Our results show that flat teams tend to perform better than hierarchical ones, while diversity has a nuanced impact. Interviews suggest agents are overconfident about their team performance, yet post-task reflections reveal both appreciation for collaboration and challenges in integration, including limited conversational coordination.