LG AIOct 12, 2022

Predictive Querying for Autoregressive Neural Sequence Models

Alex Boyd, Sam Showalter, Stephan Mandt, Padhraic Smyth

arXiv:2210.06464v35.85 citationsh-index: 77Has Code

Originality Incremental advance

AI Analysis

This work addresses a bottleneck in applying neural sequence models to real-world domains like user modeling and medicine, where complex queries are needed, though it is incremental in improving existing methods.

The paper tackles the problem of efficiently answering complex probabilistic queries in neural autoregressive sequence models, such as RNNs and transformers, by introducing a typology and new estimation methods, demonstrating tractable query answering across diverse datasets and models with clear cost-accuracy tradeoffs.

In reasoning about sequential events it is natural to pose probabilistic queries such as "when will event A occur next" or "what is the probability of A occurring before B", with applications in areas such as user modeling, medicine, and finance. However, with machine learning shifting towards neural autoregressive models such as RNNs and transformers, probabilistic querying has been largely restricted to simple cases such as next-event prediction. This is in part due to the fact that future querying involves marginalization over large path spaces, which is not straightforward to do efficiently in such models. In this paper we introduce a general typology for predictive queries in neural autoregressive sequence models and show that such queries can be systematically represented by sets of elementary building blocks. We leverage this typology to develop new query estimation methods based on beam search, importance sampling, and hybrids. Across four large-scale sequence datasets from different application domains, as well as for the GPT-2 language model, we demonstrate the ability to make query answering tractable for arbitrary queries in exponentially-large predictive path-spaces, and find clear differences in cost-accuracy tradeoffs between search and sampling methods.

View on arXiv PDF Code

Similar