CLOct 24, 2023

Unnatural language processing: How do language models handle machine-generated prompts?

arXiv:2310.15829v1134 citationsh-index: 10
Originality Incremental advance
AI Analysis

This research addresses a fundamental problem in AI interpretability for researchers and practitioners, revealing that models may not rely on linguistic circuits for non-natural inputs, though it is incremental in building on existing prompt optimization work.

The study investigated how language models respond to machine-generated prompts that lack natural language structure, finding that these prompts trigger different internal processing patterns compared to human-generated prompts, such as varied perplexities and activation profiles, even when outputs are similar.

Language model prompt optimization research has shown that semantically and grammatically well-formed manually crafted prompts are routinely outperformed by automatically generated token sequences with no apparent meaning or syntactic structure, including sequences of vectors from a model's embedding space. We use machine-generated prompts to probe how models respond to input that is not composed of natural language expressions. We study the behavior of models of different sizes in multiple semantic tasks in response to both continuous and discrete machine-generated prompts, and compare it to the behavior in response to human-generated natural-language prompts. Even when producing a similar output, machine-generated and human prompts trigger different response patterns through the network processing pathways, including different perplexities, different attention and output entropy distributions, and different unit activation profiles. We provide preliminary insight into the nature of the units activated by different prompt types, suggesting that only natural language prompts recruit a genuinely linguistic circuit.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes