Extraction of Templates from Phrases Using Sequence Binary Decision Diagrams
This work addresses template extraction for natural language processing tasks, offering an incremental improvement with a new algorithm for more efficient representation from small data.
The paper tackles the problem of extracting templates like 'regard X as Y' from phrases by introducing an unsupervised method using a novel relaxed variant of Sequence Binary Decision Diagrams (SeqBDD), which compresses sequences into compact structures to induce templates from tagged text, achieving high-quality extraction in experiments on verb+preposition and social media phrasal templates.
The extraction of templates such as ``regard X as Y'' from a set of related phrases requires the identification of their internal structures. This paper presents an unsupervised approach for extracting templates on-the-fly from only tagged text by using a novel relaxed variant of the Sequence Binary Decision Diagram (SeqBDD). A SeqBDD can compress a set of sequences into a graphical structure equivalent to a minimal DFA, but more compact and better suited to the task of template extraction. The main contribution of this paper is a relaxed form of the SeqBDD construction algorithm that enables it to form general representations from a small amount of data. The process of compression of shared structures in the text during Relaxed SeqBDD construction, naturally induces the templates we wish to extract. Experiments show that the method is capable of high-quality extraction on tasks based on verb+preposition templates from corpora and phrasal templates from short messages from social media.