LGCLNov 1, 2023

Comparing Optimization Targets for Contrast-Consistent Search

arXiv:2311.00488v14 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for researchers working on interpretability and truth probing in AI models.

The paper tackles the problem of optimizing Contrast-Consistent Search (CCS) to recover truth representations in large language models by introducing a Midpoint-Displacement (MD) loss function, resulting in higher test accuracy than CCS with a better hyper-parameter.

We investigate the optimization target of Contrast-Consistent Search (CCS), which aims to recover the internal representations of truth of a large language model. We present a new loss function that we call the Midpoint-Displacement (MD) loss function. We demonstrate that for a certain hyper-parameter value this MD loss function leads to a prober with very similar weights to CCS. We further show that this hyper-parameter is not optimal and that with a better hyper-parameter the MD loss function attains a higher test accuracy than CCS.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes