CLAIMay 13, 2024

MacBehaviour: An R package for behavioural experimentation on large language models

arXiv:2405.07495v15 citationsh-index: 7Has CodeBehavior Research Methods
Originality Synthesis-oriented
AI Analysis

This tool simplifies and standardizes LLM behavioral studies for researchers, but it is incremental as it builds on existing experimental frameworks.

The authors developed an R package called MacBehaviour to streamline behavioral experimentation on large language models (LLMs), enabling interaction with over 60 models and demonstrating its utility by replicating sound-gender associations in three LLMs, showing they exhibit human-like tendencies.

There has been increasing interest in investigating the behaviours of large language models (LLMs) and LLM-powered chatbots by treating an LLM as a participant in a psychological experiment. We therefore developed an R package called "MacBehaviour" that aims to interact with more than 60 language models in one package (e.g., OpenAI's GPT family, the Claude family, Gemini, Llama family, and open-source models) and streamline the experimental process of LLMs behaviour experiments. The package offers a comprehensive set of functions designed for LLM experiments, covering experiment design, stimuli presentation, model behaviour manipulation, logging response and token probability. To demonstrate the utility and effectiveness of "MacBehaviour," we conducted three validation experiments on three LLMs (GPT-3.5, Llama-2 7B, and Vicuna-1.5 13B) to replicate sound-gender association in LLMs. The results consistently showed that they exhibit human-like tendencies to infer gender from novel personal names based on their phonology, as previously demonstrated (Cai et al., 2023). In summary, "MacBehaviour" is an R package for machine behaviour studies which offers a user-friendly interface and comprehensive features to simplify and standardize the experimental process.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes