Vaclav Cvicek

AI
h-index1
4papers
1,781citations
Novelty51%
AI Score34

4 Papers

AIMar 28, 2022
LogicInference: A New Dataset for Teaching Logical Inference to seq2seq Models

Santiago Ontanon, Joshua Ainslie, Vaclav Cvicek et al. · deepmind

Machine learning models such as Transformers or LSTMs struggle with tasks that are compositional in nature such as those involving reasoning/inference. Although many datasets exist to evaluate compositional generalization, when it comes to evaluating inference abilities, options are more limited. This paper presents LogicInference, a new dataset to evaluate the ability of models to perform logical inference. The dataset focuses on inference using propositional logic and a small subset of first-order logic, represented both in semi-formal logical notation, as well as in natural language. We also report initial results using a collection of machine learning models to establish an initial baseline in this dataset.

LGApr 8, 2025
SPIRe: Boosting LLM Inference Throughput with Speculative Decoding

Sanjit Neelam, Daniel Heinlein, Vaclav Cvicek et al.

Speculative decoding (SD) has been shown to reduce the latency of autoregressive decoding (AD) by 2-3x for small batch sizes. However, increasing throughput and therefore reducing the cost per token requires decoding with large batch sizes. Recent work shows that SD can accelerate decoding with large batch sizes too if the context is sufficiently long and the draft model's KV cache is sparse. We introduce SPIRe, a draft model that combines static sparse attention, pruned initialization, and feedback memory to increase the modeled throughput of speculative decoding by over 100% compared to speculation with a much smaller draft model and by over 35% compared to the strong baseline of sparse self-speculation. Our approach is particularly effective when context lengths vary significantly across requests.

AIAug 9, 2021
Making Transformers Solve Compositional Tasks

Santiago Ontañón, Joshua Ainslie, Vaclav Cvicek et al.

Several studies have reported the inability of Transformer models to generalize compositionally, a key type of generalization in many NLP tasks such as semantic parsing. In this paper we explore the design space of Transformer models showing that the inductive biases given to the model by several design decisions significantly impact compositional generalization. Through this exploration, we identified Transformer configurations that generalize compositionally significantly better than previously reported in the literature in a diverse set of compositional tasks, and that achieve state-of-the-art results in a semantic parsing compositional generalization benchmark (COGS), and a string edit operation composition benchmark (PCFG).

LGApr 17, 2020
ETC: Encoding Long and Structured Inputs in Transformers

Joshua Ainslie, Santiago Ontanon, Chris Alberti et al.

Transformer models have advanced the state of the art in many Natural Language Processing (NLP) tasks. In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key challenges of standard Transformer architectures, namely scaling input length and encoding structured inputs. To scale attention to longer inputs, we introduce a novel global-local attention mechanism between global tokens and regular input tokens. We also show that combining global-local attention with relative position encodings and a Contrastive Predictive Coding (CPC) pre-training objective allows ETC to encode structured inputs. We achieve state-of-the-art results on four natural language datasets requiring long and/or structured inputs.