Funny or Persuasive, but Not Both: Evaluating Fine-Grained Multi-Concept Control in LLMs
This addresses the need for better controllability in LLMs for applications requiring specific textual attributes, though it is incremental as it builds on prior prompting and representation engineering methods.
The paper tackles the problem of fine-grained multi-concept control in LLMs, such as humor and persuasiveness, and finds that performance often drops in dual-concept settings, revealing a limitation in naive prompting-based control.
Large Language Models (LLMs) offer strong generative capabilities, but many applications require explicit and \textit{fine-grained} control over specific textual concepts, such as humor, persuasiveness, or formality. Prior approaches in prompting and representation engineering can provide coarse or single-attribute control, but systematic evaluation of multi-attribute settings remains limited. We introduce an evaluation framework for fine-grained controllability for both single- and dual-concept scenarios, focusing on linguistically distinct concept pairs (e.g., persuasiveness vs.~humor). Surprisingly, across multiple LLMs and generative tasks, we find that performance often drops in the dual-concept setting, even though the chosen concepts should in principle be separable. This reveals a fundamental limitation of naive prompting-based control: models struggle with compositionality even when concepts are intuitively independent. Our framework provides systematic evidence of this gap and offers a principled approach for measuring the ability of future methods for multi-concept control.