62.2LGMay 21Code
Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?Jeanmely Rojas Nunez, Viraj Sawant, Nathan Allen et al.
Fine-tuning large language models (LLMs) frequently induces catastrophic forgetting of prior capabilities. Recent work has shown that reinforcement learning (RL) retains prior capabilities more effectively than supervised fine-tuning (SFT), attributing this to policy-gradient updates remaining closer to the base policy \cite{shenfeld2025rl}. We extend this behavioral account to the mechanistic level and ask whether RL's advantage is mirrored by stronger preservation of internal computational circuits. We introduce differential circuit vulnerability, a head-level measure of how much a circuit degrades under fine-tuning, and use it to compare RL and SFT on Qwen2.5-3B-Instruct adapted to scientific question-answering. We find a clear mechanistic trade-off: SFT adapts more rapidly to the target task but produces substantially greater circuit disruption and forgetting of prior capabilities, whereas RL preserves a larger fraction of the base circuit at the cost of slower task adaptation. These findings suggest that circuit preservation may help explain why RL is more robust to catastrophic forgetting. We released our code here: https://github.com/rl-sft-circuit-research/differential-circuit-vulnerability.
SYMar 3, 2017
An intracardiac electrogram model to bridge virtual hearts and implantable cardiac devicesWeiwei Ai, Nitish Patel, Partha Roop et al.
Virtual heart models have been proposed to enhance the safety of implantable cardiac devices through closed loop validation. To communicate with a virtual heart, devices have been driven by cardiac signals at specific sites. As a result, only the action potentials of these sites are sensed. However, the real device implanted in the heart will sense a complex combination of near and far-field extracellular potential signals. Therefore many device functions, such as blanking periods and refractory periods, are designed to handle these unexpected signals. To represent these signals, we develop an intracardiac electrogram (IEGM) model as an interface between the virtual heart and the device. The model can capture not only the local excitation but also far-field signals and pacing afterpotentials. Moreover, the sensing controller can specify unipolar or bipolar electrogram (EGM) sensing configurations and introduce various oversensing and undersensing modes. The simulation results show that the model is able to reproduce clinically observed sensing problems, which significantly extends the capabilities of the virtual heart model in the context of device validation.
14.9FLMar 26
Synchronous Signal Temporal Logic for Decidable Verification of Cyber-Physical SystemsPartha Roop, Sobhan Chatterjee, Avinash Malik et al.
Many Cyber Physical System (CPS) work in a safety-critical environment, where correct execution, reliability and trustworthiness are essential. Signal Temporal Logic (STL) provides a formal framework for checking safety-critical CPS. However, static verification of STL is undecidable in general, except when we want to verify using run-time-based methods, which have limitations. We propose Synchronous Signal Temporal Logic (SSTL), a decidable fragment of STL, which admits static safety and liveness property verification. In SSTL, we assume that a signal is sampled at fixed discrete steps, called ticks, and then propose a hypothesis, called the Signal Invariance Hypothesis (SIH), which is inspired by a similar hypothesis for synchronous programs. We define the syntax and semantics of SSTL and show that SIH is a necessary and sufficient condition for equivalence between an STL formula and its SSTL counterpart. By translating SSTL to LTL_P (LTL defined over predicates), we enable decidable model checking using the SPIN model checker. We demonstrate the approach on a 33-node human heart model and other case studies.