When Determinants Are Not Enough: Private Rare Switching
For researchers in private linear bandits and RL, this provides a fix for a known bottleneck in rare switching under differential privacy.
The paper addresses the failure of determinant-based rare switching in private linear bandits due to noise breaking monotonicity, and introduces a generalized Rayleigh quotient-based rule that restores logarithmic policy updates and confidence-width comparison up to a constant factor.
In this note, I would like to share a small research moment where Codex helped me find the right way to adapt rare switching to the private setting. The standard determinant-based update rule in linear bandits and RL works beautifully because the design matrix grows monotonically. But once Gaussian noise is added for privacy, this monotonicity can fail, and the usual analysis no longer goes through. The key reason is that determinant growth controls volume, while regret analysis needs control of the worst direction. To address this, Codex comes up with a different rare-switching rule based on the generalized Rayleigh quotient, which restores logarithmic policy updates and the desired confidence-width comparison up to a constant factor. I present my manually clean-up version of the proof here as well as some personal reflection on this example.