Jing Zhang

13.2CLNov 16, 2024Code

SAM Decoding: Speculative Decoding via Suffix Automaton

Yuxuan Hu, Ke Wang, Xiaokang Zhang et al.

Speculative decoding (SD) has been demonstrated as an effective technique for lossless LLM inference acceleration. Retrieval-based SD methods, one kind of model-free method, have yielded promising speedup, but they often rely on incomplete retrieval resources, inefficient retrieval methods, and are constrained to certain domains. This paper presents a novel retrieval-based speculative decoding method that adapts suffix automaton (SAM) for efficient and accurate draft generation by utilizing common text corpus and dynamic text sequence. Unlike existing $n$-gram matching methods, SAM-Decoding finds the exact longest suffix match, achieving an average time complexity of O(1) per generation step of SAM update and suffix retrieval. It can also integrate with existing methods, adaptively selecting a draft generation strategy based on match length to generalize to broader domains. Extensive experiments on Spec-Bench show that our method is $18\%+$ faster than other retrieval-based SD methods. Additionally, when combined with advanced EAGLE-2, it provides an additional speedup of $3.28\%$ -- $11.13\%$ across various-sized LLM backbones. Our code is available at our \href{https://github.com/hyx1999/SAM-Decoding}{repository}.

13.8AIApr 21, 2020Code

LineaRE: Simple but Powerful Knowledge Graph Embedding for Link Prediction

Yanhui Peng, Jing Zhang

The task of link prediction for knowledge graphs is to predict missing relationships between entities. Knowledge graph embedding, which aims to represent entities and relations of a knowledge graph as low dimensional vectors in a continuous vector space, has achieved promising predictive performance. If an embedding model can cover different types of connectivity patterns and mapping properties of relations as many as possible, it will potentially bring more benefits for link prediction tasks. In this paper, we propose a novel embedding model, namely LineaRE, which is capable of modeling four connectivity patterns (i.e., symmetry, antisymmetry, inversion, and composition) and four mapping properties (i.e., one-to-one, one-to-many, many-to-one, and many-to-many) of relations. Specifically, we regard knowledge graph embedding as a simple linear regression task, where a relation is modeled as a linear function of two low-dimensional vector-presented entities with two weight vectors and a bias vector. Since the vectors are defined in a real number space and the scoring function of the model is linear, our model is simple and scalable to large knowledge graphs. Experimental results on multiple widely used real-world datasets show that the proposed LineaRE model significantly outperforms existing state-of-the-art models for link prediction tasks.

Jing Zhang

2 Papers