Probing for Incremental Parse States in Autoregressive Language Models
This addresses the understanding of syntactic processing in language models for NLP researchers, but it is incremental as it extends existing probing work to an incremental setting.
The study investigated whether autoregressive language models learn to maintain implicit incremental syntactic structures, finding that probes can predict model preferences on ambiguous prefixes and causally intervene to steer behavior, suggesting such inferences underlie next-word predictions.
Next-word predictions from autoregressive neural language models show remarkable sensitivity to syntax. This work evaluates the extent to which this behavior arises as a result of a learned ability to maintain implicit representations of incremental syntactic structures. We extend work in syntactic probing to the incremental setting and present several probes for extracting incomplete syntactic structure (operationalized through parse states from a stack-based parser) from autoregressive language models. We find that our probes can be used to predict model preferences on ambiguous sentence prefixes and causally intervene on model representations and steer model behavior. This suggests implicit incremental syntactic inferences underlie next-word predictions in autoregressive neural language models.