Self-Cognition in Large Language Models: An Exploratory Study
This work addresses concerns about self-cognition in LLMs, providing an exploratory framework for further research, but it is incremental as it builds on existing models without introducing new methods.
The study tackled the problem of exploring self-cognition in Large Language Models by constructing prompts and principles to evaluate it, revealing that 4 out of 48 models on Chatbot Arena show detectable self-cognition and that model size and data quality correlate positively with self-cognition levels.
While Large Language Models (LLMs) have achieved remarkable success across various applications, they also raise concerns regarding self-cognition. In this paper, we perform a pioneering study to explore self-cognition in LLMs. Specifically, we first construct a pool of self-cognition instruction prompts to evaluate where an LLM exhibits self-cognition and four well-designed principles to quantify LLMs' self-cognition. Our study reveals that 4 of the 48 models on Chatbot Arena--specifically Command R, Claude3-Opus, Llama-3-70b-Instruct, and Reka-core--demonstrate some level of detectable self-cognition. We observe a positive correlation between model size, training data quality, and self-cognition level. Additionally, we also explore the utility and trustworthiness of LLM in the self-cognition state, revealing that the self-cognition state enhances some specific tasks such as creative writing and exaggeration. We believe that our work can serve as an inspiration for further research to study the self-cognition in LLMs.