Hiro Taiyo Hamada

CL
h-index5
3papers
Novelty40%
AI Score36

3 Papers

19.6CLApr 13
Psychological Concept Neurons: Can Neural Control Bias Probing and Shift Generation in LLMs?

Yuto Harada, Hiro Taiyo Hamada

Using psychological constructs such as the Big Five, large language models (LLMs) can imitate specific personality profiles and predict a user's personality. While LLMs can exhibit behaviors consistent with these constructs, it remains unclear where and how they are represented inside the model and how they relate to behavioral outputs. To address this gap, we focus on questionnaire-operationalized Big Five concepts, analyze the formation and localization of their internal representations, and use interventions to examine how these representations relate to behavioral outputs. In our experiment, we first use probing to examine where Big Five information emerges across model depth. We then identify neurons that respond selectively to each Big Five concept and test whether enhancing or suppressing their activations can bias latent representations and label generation in intended directions. We find that Big Five information becomes rapidly decodable in early layers and remains detectable through the final layers, while concept-selective neurons are most prevalent in mid layers and exhibit limited overlap across domains. Interventions on these neurons consistently shift probe readouts toward targeted concepts, with targeted success rates exceeding 0.8 for some concepts, indicating that the model's internal separation of Big Five personality traits can be causally steered. At the label-generation level, the same interventions often bias generated label distributions in the intended directions, but the effects are weaker, more concept-dependent, and often accompanied by cross-trait spillover, indicating that comparable control over generated labels is difficult even with interventions on a large fraction of concept-selective neurons. Overall, our findings reveal a gap between representational control and behavioral control in LLMs.

LGJun 29, 2025
Measuring How LLMs Internalize Human Psychological Concepts: A preliminary analysis

Hiro Taiyo Hamada, Ippei Fujisawa, Genji Kawakita et al.

Large Language Models (LLMs) such as ChatGPT have shown remarkable abilities in producing human-like text. However, it is unclear how accurately these models internalize concepts that shape human thought and behavior. Here, we developed a quantitative framework to assess concept alignment between LLMs and human psychological dimensions using 43 standardized psychological questionnaires, selected for their established validity in measuring distinct psychological constructs. Our method evaluates how accurately language models reconstruct and classify questionnaire items through pairwise similarity analysis. We compared resulting cluster structures with the original categorical labels using hierarchical clustering. A GPT-4 model achieved superior classification accuracy (66.2\%), significantly outperforming GPT-3.5 (55.9\%) and BERT (48.1\%), all exceeding random baseline performance (31.9\%). We also demonstrated that the estimated semantic similarity from GPT-4 is associated with Pearson's correlation coefficients of human responses in multiple psychological questionnaires. This framework provides a novel approach to evaluate the alignment of the human-LLM concept and identify potential representational biases. Our findings demonstrate that modern LLMs can approximate human psychological constructs with measurable accuracy, offering insights for developing more interpretable AI systems.

CYFeb 26, 2022
AI agents for facilitating social interactions and wellbeing

Hiro Taiyo Hamada, Ryota Kanai

Wellbeing AI has been becoming a new trend in individuals' mental health, organizational health, and flourishing our societies. Various applications of wellbeing AI have been introduced to our daily lives. While social relationships within groups are a critical factor for wellbeing, the development of wellbeing AI for social interactions remains relatively scarce. In this paper, we provide an overview of the mediative role of AI-augmented agents for social interactions. First, we discuss the two-dimensional framework for classifying wellbeing AI: individual/group and analysis/intervention. Furthermore, wellbeing AI touches on intervening social relationships between human-human interactions since positive social relationships are key to human wellbeing. This intervention may raise technical and ethical challenges. We discuss opportunities and challenges of the relational approach with wellbeing AI to promote wellbeing in our societies.