Xin Sui

CL
h-index81
3papers
53citations
Novelty45%
AI Score42

3 Papers

26.2CLMay 8
Post-training makes large language models less human-like

Marcel Binz, Elif Akata, Abdullah Almaatouq et al.

Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral alignment at scale. We find that post-training -- the stage that turns base models into useful assistants -- consistently reduces alignment with human behavior across model families, sizes, and objectives. Moreover, this misalignment widens in newer model generations even as base models continue to improve. Finally, we find that persona-induction -- a popular technique for eliciting human-like behavior by conditioning models on participant-specific information -- does not improve predictions at the level of individuals. Taken together, our results suggest that the very processes that are currently employed to turn LLMs into useful assistants also make them less accurate models of human behavior.

LGOct 26, 2024
Centaur: a foundation model of human cognition

Marcel Binz, Elif Akata, Matthias Bethge et al. · princeton

Establishing a unified theory of cognition has been a major goal of psychology. While there have been previous attempts to instantiate such theories by building computational models, we currently do not have one model that captures the human mind in its entirety. A first step in this direction is to create a model that can predict human behavior in a wide range of settings. Here we introduce Centaur, a computational model that can predict and simulate human behavior in any experiment expressible in natural language. We derived Centaur by finetuning a state-of-the-art language model on a novel, large-scale data set called Psych-101. Psych-101 reaches an unprecedented scale, covering trial-by-trial data from over 60,000 participants performing over 10,000,000 choices in 160 experiments. Centaur not only captures the behavior of held-out participants better than existing cognitive models, but also generalizes to new cover stories, structural task modifications, and entirely new domains. Furthermore, we find that the model's internal representations become more aligned with human neural activity after finetuning. Taken together, our results demonstrate that it is possible to discover computational models that capture human behavior across a wide range of domains. We believe that such models provide tremendous potential for guiding the development of cognitive theories and present a case study to demonstrate this.

SPNov 2, 2024
Longitudinal Wrist PPG Analysis for Reliable Hypertension Risk Screening Using Deep Learning

Hui Lin, Jiyang Li, Ramy Hussein et al.

Hypertension is a leading risk factor for cardiovascular diseases. Traditional blood pressure monitoring methods are cumbersome and inadequate for continuous tracking, prompting the development of PPG-based cuffless blood pressure monitoring wearables. This study leverages deep learning models, including ResNet and Transformer, to analyze wrist PPG data collected with a smartwatch for efficient hypertension risk screening, eliminating the need for handcrafted PPG features. Using the Home Blood Pressure Monitoring (HBPM) longitudinal dataset of 448 subjects and five-fold cross-validation, our model was trained on over 68k spot-check instances from 358 subjects and tested on real-world continuous recordings of 90 subjects. The compact ResNet model with 0.124M parameters performed significantly better than traditional machine learning methods, demonstrating its effectiveness in distinguishing between healthy and abnormal cases in real-world scenarios.