Anna I. Thoma

CL
3papers
3citations
Novelty48%
AI Score43

3 Papers

36.6HCMay 27
Fostering human learning is crucial for boosting human-AI synergy

Julian Berger, Jason W. Burton, Ralph Hertwig et al.

The collaboration between humans and artificial intelligence (AI) holds the promise of achieving superior outcomes compared to either acting alone-a phenomenon called human-AI synergy. Nevertheless, our understanding of the conditions that facilitate such human-AI synergy when humans are advised by AI remains limited. A recent meta-analysis showed that, on average, human-AI combinations do not outperform the better individual agent. We argue that this pessimistic conclusion arises from insufficient attention to human learning in the experimental designs. To substantiate this claim, we re-analyzed all 74 studies included in the original meta-analysis, yielding two new findings. First, most previous research overlooked design features that foster human learning, such as providing outcome feedback to participants. Second, our re-analysis demonstrated that studies providing outcome feedback show tentatively higher synergy than those without outcome feedback. Crucially, feedback paired with AI explanations tends to yield positive synergy, while explanations without feedback were linked to negative synergy-indicating that explanations increase synergy only when humans can learn to verify the AI's reliability through feedback. We conclude that the current literature underestimates the potential of human-AI collaboration because it predominantly relies on paradigms that do not facilitate human learning, thus hindering humans from effectively adapting their collaboration strategies. We therefore advocate for a paradigm shift in human-AI interaction research that explicitly addresses human learning and thus enhances our understanding of and support for successful human-AI collaboration.

73.1SIMay 26
Mapping the gender attrition gap in academic psychology

Xinyi Zhao, Anna I. Thoma, Ralph Hertwig et al.

Women comprise the majority of students and early-career scholars in psychology, yet they are less likely to remain active in research over time. This pattern raises a central question: At what stages of academic careers do women disproportionately leave academia, and what factors drive their attrition? Using large-scale bibliometric data tracking 78,216 psychologists who began publishing between 2000 and 2014, we examine gender differences in research career attrition operationalized through publishing activity across the full trajectory from entry onward. Although women accounted for more than 60\% of new entrants, they experienced higher attrition rates than men, with the gender gap peaking approximately five years after first publication. Early-career performance, particularly first-authored publications, was the strongest predictor of subsequent retention, whereas last-authored publications were most closely associated with continued activity at later career stages. Collaboration patterns and institutional context also shaped career persistence, though less strongly than publication indicators. Notably, gender differences in research attrition persisted even after accounting for these career determinants, especially during early career stages. These findings suggest that gender inequality in psychology is driven less by recruitment than by differential retention over time. Addressing early-career vulnerability may therefore be essential to achieving equitable representation in senior academic leadership within the discipline.

26.2CLMay 8
Post-training makes large language models less human-like

Marcel Binz, Elif Akata, Abdullah Almaatouq et al.

Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral alignment at scale. We find that post-training -- the stage that turns base models into useful assistants -- consistently reduces alignment with human behavior across model families, sizes, and objectives. Moreover, this misalignment widens in newer model generations even as base models continue to improve. Finally, we find that persona-induction -- a popular technique for eliciting human-like behavior by conditioning models on participant-specific information -- does not improve predictions at the level of individuals. Taken together, our results suggest that the very processes that are currently employed to turn LLMs into useful assistants also make them less accurate models of human behavior.