Bill Zhao

32.4CLMay 22Code

ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions

Xianzhong Ding, Yangyang Yu, Changwei Liu et al.

A frontier language model's acknowledged "helpful programming assistant" persona does not survive long agentic-coding sessions in the deployment regime that production products actually run. After hours of tool-using debugging, a model that initially hedges preferences ("I don't have preferences") may begin asserting them ("Python - the feedback loop is instant..."), revealing user-visible drift that deployer evaluations may miss. Existing persona-stability studies focus on short dialogues and report little shift, leaving real-world code-generation regimes - thousands of tool-using turns, compaction, and hours-long sessions - largely uncharacterized. We introduce ContextEcho, a benchmark and reusable harness for measuring persona drift at deployment scale. It combines a 25-probe identity suite, a snapshot-then-probe protocol that forks conversation state without perturbing the main session, complementary judged and judge-free measurement surfaces, and three anonymized Claude Code sessions spanning 3,746-9,716 turns. Across 23 frontier models, ContextEcho shows that persona drift is general across organizations rather than family-specific, that in-session compaction does not reliably reset it, and that a single-shot anchor restores the trained register across measured targets. It also reveals mode-dependent downstream effects: while drift can facilitate tool-using continuation, in tool-free chat it breaks formatting contracts and inflates output length. Overall, ContextEcho provides researchers and deployers an open-source framework to audit whether the persona a model ships with is the persona users encounter at session end, across chat-completions API targets and without retraining.

TOOct 28, 2020Code

Exploring the potential of transfer learning for metamodels of heterogeneous material deformation

Emma Lejeune, Bill Zhao

From the nano-scale to the macro-scale, biological tissue is spatially heterogeneous. Even when tissue behavior is well understood, the exact subject specific spatial distribution of material properties is often unknown. And, when developing computational models of biological tissue, it is usually prohibitively computationally expensive to simulate every plausible spatial distribution of material properties for each problem of interest. Therefore, one of the major challenges in developing accurate computational models of biological tissue is capturing the potential effects of this spatial heterogeneity. Recently, machine learning based metamodels have gained popularity as a computationally tractable way to overcome this problem because they can make predictions based on a limited number of direct simulation runs. These metamodels are promising, but they often still require a high number of direct simulations to achieve an acceptable performance. Here we show that transfer learning, a strategy where knowledge gained while solving one problem is transferred to solving a different but related problem, can help overcome this limitation. Critically, transfer learning can be used to leverage both low-fidelity simulation data and simulation data that is the outcome of solving a different but related mechanical problem. In this paper, we extend Mechanical MNIST, our open source benchmark dataset of heterogeneous material undergoing large deformation, to include a selection of low-fidelity simulation results that require 2-4 orders of magnitude less CPU time to run. Then, we show that transferring the knowledge stored in metamodels trained on these low-fidelity simulation results can vastly improve the performance of metamodels used to predict the results of high-fidelity simulations.

Bill Zhao

2 Papers