Doppelgänger's Watch: A Split Objective Approach to Large Language Models
This addresses the challenge of separating supervision from helpfulness in LLMs, but it is incremental as it builds on existing architectures without proven impact.
The paper tackles the problem of generation supervision in large language models by proposing a bicameral architecture with a Doppelgänger module that supervises token generation and predicts supervision scores, but no experimental results or concrete numbers are provided as they are deferred to a future publication.
In this paper, we investigate the problem of "generation supervision" in large language models, and present a novel bicameral architecture to separate supervision signals from their core capability, helpfulness. Doppelgänger, a new module parallel to the underlying language model, supervises the generation of each token, and learns to concurrently predict the supervision score(s) of the sequences up to and including each token. In this work, we present the theoretical findings, and leave the report on experimental results to a forthcoming publication.