Joshua Safyan

h-index13
2papers

2 Papers

LGOct 9, 2025
Prompts Generalize with Low Data: Non-vacuous Generalization Bounds for Optimizing Prompts with More Informative Priors

David Madras, Joshua Safyan, Qiuyi et al.

Many prompt engineering techniques have been successful in practice, even when optimizing over a large prompt space with with a small amount of task-specific data. Recent work has partially explained this success by showing generalization bounds which apply PAC-Bayes theory to the discrete prompt space, but they are non-vacuous only in data-rich scenarios. We argue that such widespread success can be more fully explained through more carefully considering data- or distribution-dependent perplexity, which acts as an effective prior and steers the optimization towards prompts that are more ``natural'' for the task at hand. We derive novel generalization bounds that are non-vacuous for data-scarce prompt optimization via more useful priors, formally analyzing how perplexity regularization tightens these bounds by limiting exploration. Empirically, we explore both the bounds' effectiveness and the practical benefits of perplexity regularization in improving prompt generalization.

LGJul 5, 2020
Momentum Accelerates Evolutionary Dynamics

Marc Harper, Joshua Safyan

We combine momentum from machine learning with evolutionary dynamics, where momentum can be viewed as a simple mechanism of intergenerational memory. Using information divergences as Lyapunov functions, we show that momentum accelerates the convergence of evolutionary dynamics including the replicator equation and Euclidean gradient descent on populations. When evolutionarily stable states are present, these methods prove convergence for small learning rates or small momentum, and yield an analytic determination of the relative decrease in time to converge that agrees well with computations. The main results apply even when the evolutionary dynamic is not a gradient flow. We also show that momentum can alter the convergence properties of these dynamics, for example by breaking the cycling associated to the rock-paper-scissors landscape, leading to either convergence to the ordinarily non-absorbing equilibrium, or divergence, depending on the value and mechanism of momentum.