20.0AIMay 28
Anchorless Diversification for Parallel LLM IdeationFares Nabil Ibrahim, Nafis Saami Azad, Raiyan Abdul Baten
LLMs are increasingly used to generate candidate-idea pools for creative tasks where broad exploration is valuable. Parallel inference can be attractive in this setting when it broadens the pool while retaining quality and cost efficiency. We study inference-time controls for candidate-pool diversification, asking whether anchorless methods can rival methods that depend on observed seed ideas. Across three creative task families, we compare independent generation and semantic direction stratification with self-, peer-, and representative-anchor baselines, under neutral and population-referential divergent instructions. Population-referential divergence is a strong low-cost baseline, increasing semantic diversity while preserving quality proxies. Semantic direction stratification is stronger: a single planning call organizes generations across broad semantic directions, yielding the best diversity--quality--compute frontier. Anchored regeneration can be strong in final-pool diversity, but its advantage shrinks under full-pipeline token accounting. These results establish practical anchorless baselines for open-ended LLM ideation.
LGDec 21, 2022
NADBenchmarks -- a compilation of Benchmark Datasets for Machine Learning Tasks related to Natural DisastersAdiba Mahbub Proma, Md Saiful Islam, Stela Ciko et al.
Climate change has increased the intensity, frequency, and duration of extreme weather events and natural disasters across the world. While the increased data on natural disasters improves the scope of machine learning (ML) in this field, progress is relatively slow. One bottleneck is the lack of benchmark datasets that would allow ML researchers to quantify their progress against a standard metric. The objective of this short paper is to explore the state of benchmark datasets for ML tasks related to natural disasters, categorizing them according to the disaster management cycle. We compile a list of existing benchmark datasets introduced in the past five years. We propose a web platform - NADBenchmarks - where researchers can search for benchmark datasets for natural disasters, and we develop a preliminary version of such a platform using our compiled list. This paper is intended to aid researchers in finding benchmark datasets to train their ML models on, and provide general directions for topics where they can contribute new benchmark datasets.
LGOct 23, 2023
Context-Aware Prediction of User Engagement on Online Social PlatformsHeinrich Peters, Yozen Liu, Francesco Barbieri et al.
The success of online social platforms hinges on their ability to predict and understand user behavior at scale. Here, we present data suggesting that context-aware modeling approaches may offer a holistic yet lightweight and potentially privacy-preserving representation of user engagement on online social platforms. Leveraging deep LSTM neural networks to analyze more than 100 million Snapchat sessions from almost 80.000 users, we demonstrate that patterns of active and passive use are predictable from past behavior (R2=0.345) and that the integration of context features substantially improves predictive performance compared to the behavioral baseline model (R2=0.522). Features related to smartphone connectivity status, location, temporal context, and weather were found to capture non-redundant variance in user engagement relative to features derived from histories of in-app behaviors. Further, we show that a large proportion of variance can be accounted for with minimal behavioral histories if momentary context is considered (R2=0.442). These results indicate the potential of context-aware approaches for making models more efficient and privacy-preserving by reducing the need for long data histories. Finally, we employ model explainability techniques to glean preliminary insights into the underlying behavioral mechanisms. Our findings are consistent with the notion of context-contingent, habit-driven patterns of active and passive use, underscoring the value of contextualized representations of user behavior for predicting user engagement on social platforms.
33.6DLApr 20
Beginner's Charm: Beginner-Heavy Teams Are Associated With High Scientific DisruptionMahdee Mushfique Kamal, Raiyan Abdul Baten
Teams now drive most scientific advances, yet the impact of absolute beginners -- authors with no prior publications -- remains understudied. Analyzing over 29 million articles published between 1941 and 2020 across disciplines and team sizes, we uncover a near-universal and previously undocumented pattern: teams with a higher fraction of beginners are systematically more disruptive and innovative. Their contributions are linked to distinct knowledge-integration behaviors, including drawing on broader and less canonical prior work and producing more atypical recombinations. Collaboration structure further shapes outcomes: disruption is high when beginners work with early-career colleagues or with co-authors who have disruptive track records. Although disruption and citations are negatively correlated overall, highly disruptive papers from beginner-heavy teams are highly cited. These findings reveal a ``beginner's charm'' in science, highlighting the underrecognized yet powerful value of beginner fractions in teams and suggesting actionable strategies for fostering a thriving ecosystem of innovation in science and technology.
36.5HCMar 25
General Intellectual Humility Is Malleable Through AI-Mediated Reflective DialogueMohammad Ratul Mahjabin, Raiyan Abdul Baten
General intellectual humility (GIH) -- the recognition that one's beliefs may be fallible and revisable -- is associated with improved reasoning, learning, and social discourse, yet is widely regarded as a stable trait resistant to intervention. We test whether GIH can be elevated through a conversational intervention that combines staged cognitive scaffolding with personalized Socratic reflection. In a randomized controlled experiment (N=400), participants engaged in a structured, LLM-mediated dialogue that progressed from conceptual understanding of intellectual humility to applying, analyzing, evaluating, and generating novel, self-relevant scenarios that instantiate it. Relative to a time-matched control, the intervention produced a systematic increase in GIH, reduced rank-order stability, and tripled the rate of reliable individual improvement. Crucially, these effects persisted over a two-week follow-up without detectable decay. The effects generalized across political affiliation and did not depend on baseline personality profile. These findings challenge the prevailing pessimism regarding the malleability of GIH and suggest that scaffolded, Socratic reflection delivered through structured dialogue can produce durable changes in general intellectual humility.
34.6AIMay 7
Ex Ante Evaluation of AI-Induced Idea Diversity CollapseNafis Saami Azad, Raiyan Abdul Baten
Creative AI systems are typically evaluated at the level of individual utility, yet creative outputs are consumed in populations: an idea loses value when many others produce similar ones. This creates an evaluation blind spot, as AI can improve individual outputs while increasing population-level crowding. We introduce a human-relative framework for benchmarking AI-induced human diversity collapse without requiring human-AI interaction data, providing an ex ante protocol to estimate crowding risk from model-only generations and matched unaided human baselines. By modeling ideas as congestible resources, we show that source-level crowding is identifiable from within-distribution comparisons, yielding an excess-crowding coefficient $Δ$ and a human-relative diversity ratio $ρ$. We show that $ρ\ge1$ is the no-excess-crowding parity condition and connect $Δ$ to an adoption game with exposure-dependent redundancy costs. Across short stories, marketing slogans, and alternative-uses tasks, three frontier LLMs fall below parity across crowding kernels. Estimates stabilize with feasible model-only sample sizes. Importantly, generation-protocol variants show that crowding can be reduced through targeted design, making diversity collapse an actionable, development-time evaluation target for population-aware creative AI.
AIOct 20, 2024
AI Can Enhance Creativity in Social NetworksRaiyan Abdul Baten, Ali Sarosh Bangash, Krish Veera et al.
Can peer recommendation engines elevate people's creative performances in self-organizing social networks? Answering this question requires resolving challenges in data collection (e.g., tracing inspiration links and psycho-social attributes of nodes) and intervention design (e.g., balancing idea stimulation and redundancy in evolving information environments). We trained a model that predicts people's ideation performances using semantic and network-structural features in an online platform. Using this model, we built SocialMuse, which maximizes people's predicted performances to generate peer recommendations for them. We found treatment networks leveraging SocialMuse outperforming AI-agnostic control networks in several creativity measures. The treatment networks were more decentralized than the control, as SocialMuse increasingly emphasized network-structural features at large network sizes. This decentralization spreads people's inspiration sources, helping inspired ideas stand out better. Our study provides actionable insights into building intelligent systems for elevating creativity.
CLMay 22, 2025
MuseScorer: Idea Originality Scoring At ScaleAli Sarosh Bangash, Krish Veera, Ishfat Abrar Islam et al.
An objective, face-valid method for scoring idea originality is to measure each idea's statistical infrequency within a population -- an approach long used in creativity research. Yet, computing these frequencies requires manually bucketing idea rephrasings, a process that is subjective, labor-intensive, error-prone, and brittle at scale. We introduce MuseScorer, a fully automated, psychometrically validated system for frequency-based originality scoring. MuseScorer integrates a Large Language Model (LLM) with externally orchestrated retrieval: given a new idea, it retrieves semantically similar prior idea-buckets and zero-shot prompts the LLM to judge whether the idea fits an existing bucket or forms a new one. These buckets enable frequency-based originality scoring without human annotation. Across five datasets N_{participants}=1143, n_{ideas}=16,294), MuseScorer matches human annotators in idea clustering structure (AMI = 0.59) and participant-level scoring (r = 0.89), while demonstrating strong convergent and external validity. The system enables scalable, intent-sensitive, and human-aligned originality assessment for creativity research.
DBMar 26, 2021
DBATES: DataBase of Audio features, Text, and visual Expressions in competitive debate SpeechesTaylan K. Sen, Gazi Naven, Luke Gerstner et al.
In this work, we present a database of multimodal communication features extracted from debate speeches in the 2019 North American Universities Debate Championships (NAUDC). Feature sets were extracted from the visual (facial expression, gaze, and head pose), audio (PRAAT), and textual (word sentiment and linguistic category) modalities of raw video recordings of competitive collegiate debaters (N=717 6-minute recordings from 140 unique debaters). Each speech has an associated competition debate score (range: 67-96) from expert judges as well as competitor demographic and per-round reflection surveys. We observe the fully multimodal model performs best in comparison to models trained on various compositions of modalities. We also find that the weights of some features (such as the expression of joy and the use of the word we) change in direction between the aforementioned models. We use these results to highlight the value of a multimodal dataset for studying competitive, collegiate debate.
HCDec 8, 2020
Technology-driven Alteration of Nonverbal Cues and its Effects on NegotiationRaiyan Abdul Baten, Ehsan Hoque
A person's appearance, identity, and other nonverbal cues can substantially influence how one is perceived by a negotiation counterpart, potentially impacting the outcome of the negotiation. With recent advances in technology, it is now possible to alter such cues through real-time video communication. In many cases, a person's physical presence can explicitly be replaced by 2D/3D representations in live interactive media. In other cases, technologies such as deepfake can subtly and implicitly alter many nonverbal cues -- including a person's appearance and identity -- in real-time. In this article, we look at some state-of-the-art technological advances that can enable such explicit and implicit alteration of nonverbal cues. We also discuss the implications of such technology for the negotiation landscape and highlight ethical considerations that warrant deep, ongoing attention from stakeholders.
SIJul 6, 2017
Buildup of Speaking Skills in an Online Learning Community: A Network-Analytic ExplorationRasoul Shafipour, Raiyan Abdul Baten, Md Kamrul Hasan et al.
In this study, we explore peer-interaction effects in online networks on speaking skill development. In particular, we present an evidence for gradual buildup of skills in a small-group setting that has not been reported in the literature. We introduce a novel dataset of six online communities consisting of 158 participants focusing on improving their speaking skills. They video-record speeches for 5 prompts in 10 days and exchange comments and performance-ratings with their peers. We ask (i) whether the participants' ratings are affected by their interaction patterns with peers, and (ii) whether there is any gradual buildup of speaking skills in the communities towards homogeneity. To analyze the data, we employ tools from the emerging field of Graph Signal Processing (GSP). GSP enjoys a distinction from Social Network Analysis in that the latter is concerned primarily with the connection structures of graphs, while the former studies signals on top of graphs. We study the performance ratings of the participants as graph signals atop underlying interaction topologies. Total variation analysis of the graph signals show that the participants' rating differences decrease with time (slope=-0.04, p<0.01), while average ratings increase (slope=0.07, p<0.05)--thereby gradually building up the ratings towards community-wide homogeneity. We provide evidence for peer-influence through a prediction formulation. Our consensus-based prediction model outperforms baseline network-agnostic regression models by about 23% in predicting performance ratings. This, in turn, shows that participants' ratings are affected by their peers' ratings and the associated interaction patterns, corroborating previous findings. Then, we formulate a consensus-based diffusion model that captures these observations of peer-influence from our analyses.