CL AI DL IR SINov 21, 2018

Resource Mention Extraction for MOOC Discussion Forums

Ya-Hui An, Liangming Pan, Min-Yen Kan, Qiang Dong, Yan Fu

arXiv:1811.08853v10.2

Originality Incremental advance

AI Analysis

This work addresses the problem of facilitating discussion and searching in MOOC forums for learners and educators, but it is incremental as it builds on existing sequence tagging techniques.

The paper tackles the problem of automatically identifying and hyperlinking learning resource mentions in MOOC discussion forums, which are currently mentioned in free text without links, by proposing a novel task and contributing a labeled dataset (FoRM). The proposed method, which incorporates character-level and thread context information into an LSTM-CRF model, improves baseline models notably, with significant gains on challenging instances.

In discussions hosted on discussion forums for MOOCs, references to online learning resources are often of central importance. They contextualize the discussion, anchoring the discussion participants' presentation of the issues and their understanding. However they are usually mentioned in free text, without appropriate hyperlinking to their associated resource. Automated learning resource mention hyperlinking and categorization will facilitate discussion and searching within MOOC forums, and also benefit the contextualization of such resources across disparate views. We propose the novel problem of learning resource mention identification in MOOC forums. As this is a novel task with no publicly available data, we first contribute a large-scale labeled dataset, dubbed the Forum Resource Mention (FoRM) dataset, to facilitate our current research and future research on this task. We then formulate this task as a sequence tagging problem and investigate solution architectures to address the problem. Importantly, we identify two major challenges that hinder the application of sequence tagging models to the task: (1) the diversity of resource mention expression, and (2) long-range contextual dependencies. We address these challenges by incorporating character-level and thread context information into a LSTM-CRF model. First, we incorporate a character encoder to address the out-of-vocabulary problem caused by the diversity of mention expressions. Second, to address the context dependency challenge, we encode thread contexts using an RNN-based context encoder, and apply the attention mechanism to selectively leverage useful context information during sequence tagging. Experiments on FoRM show that the proposed method improves the baseline deep sequence tagging models notably, significantly bettering performance on instances that exemplify the two challenges.

View on arXiv PDF

Similar