Mitigating Bias in Locally Constrained Decoding via Tractable Proposals
For practitioners needing LLM outputs to conform to constraints like JSON schema, this provides a more efficient and less biased sampling method.
This work addresses bias in locally constrained decoding (LCD) for LLMs by proposing globally constrained decoding (GCD) and probabilistic GCD (P-GCD) proposals for sequential Monte Carlo (SMC) sampling. Experiments show that (P-)GCD converges faster to the target distribution with significantly fewer particles compared to LCD proposals.
Generations from large language models often fail to conform to desired constraints such as JSON schema. Existing locally constrained decoding (LCD) approaches enforce constraints by myopically masking out next tokens, resulting in biased sampling and degradation in performance. Recent work uses sequential Monte Carlo (SMC) methods to mitigate such biases, but designing effective proposal distributions or potential functions remains a key challenge. In this work, we propose a generic approach to construct proposals and potentials for SMC sampling from $p_{\mathrm{lm}}( \cdot \mid \mathrm{constraint})$. First, we show that constraints specified as finite automata can be tensorized for efficient execution on GPUs, which we use to construct globally constrained decoding (GCD) proposals. In addition, leveraging the fact that tensorized finite automata share the same circuit structure as hidden Markov models, we circuit-multiply them to obtain the probabilistic GCD (P-GCD) proposals encoding both logical and probabilistic information about the target distributions. We evaluate (P-)GCD on the tasks of function calling, keyword-based generation, and SQL generation. Experiments show that under the same SMC sampling setup, compared to LCD proposals, (P-)GCD converges faster to the target distribution with significantly fewer particles.