Weak Semi-Markov CRFs for NP Chunking in Informal Text
This work addresses the problem of efficient NP chunking for informal text processing, but it is incremental as it builds on existing semi-CRF methods with a focus on speed improvements.
The paper tackled noun phrase chunking in informal text by introducing a new annotated corpus of 76,490 noun phrases from SMS messages and exploring graphical models, including a novel semi-CRF variant that achieved similar accuracy but significantly lower running time compared to conventional semi-CRFs.
This paper introduces a new annotated corpus based on an existing informal text corpus: the NUS SMS Corpus (Chen and Kan, 2013). The new corpus includes 76,490 noun phrases from 26,500 SMS messages, annotated by university students. We then explored several graphical models, including a novel variant of the semi-Markov conditional random fields (semi-CRF) for the task of noun phrase chunking. We demonstrated through empirical evaluations on the new dataset that the new variant yielded similar accuracy but ran in significantly lower running time compared to the conventional semi-CRF.