Building and Measuring Trust between Large Language Models
This addresses the challenge of accurately assessing trust in LLM interactions, which is crucial for developing reliable multi-agent systems, though it is incremental as it builds on prior work on emotional connections and trust games.
The paper tackled the problem of building and measuring trust between large language models (LLMs) in multi-agent setups, finding that explicit trust measures (e.g., questionnaires) are poorly or negatively correlated with implicit measures (e.g., susceptibility to persuasion), suggesting explicit methods may be misleading.
As large language models (LLMs) increasingly interact with each other, most notably in multi-agent setups, we may expect (and hope) that `trust' relationships develop between them, mirroring trust relationships between human colleagues, friends, or partners. Yet, though prior work has shown LLMs to be capable of identifying emotional connections and recognizing reciprocity in trust games, little remains known about (i) how different strategies to build trust compare, (ii) how such trust can be measured implicitly, and (iii) how this relates to explicit measures of trust. We study these questions by relating implicit measures of trust, i.e. susceptibility to persuasion and propensity to collaborate financially, with explicit measures of trust, i.e. a dyadic trust questionnaire well-established in psychology. We build trust in three ways: by building rapport dynamically, by starting from a prewritten script that evidences trust, and by adapting the LLMs' system prompt. Surprisingly, we find that the measures of explicit trust are either little or highly negatively correlated with implicit trust measures. These findings suggest that measuring trust between LLMs by asking their opinion may be deceiving. Instead, context-specific and implicit measures may be more informative in understanding how LLMs trust each other.