An Investigation into the Pedagogical Features of Documents
This work addresses the need for automated tools in education and NLP to assess document utility, but it is incremental as it focuses on foundational corpus creation and baseline methods.
The paper tackles the problem of computationally characterizing the learning utility of technical documents by introducing pedagogical roles as an intermediary concept, and it results in the creation of the first annotated corpus for this purpose and baseline prediction techniques.
Characterizing the content of a technical document in terms of its learning utility can be useful for applications related to education, such as generating reading lists from large collections of documents. We refer to this learning utility as the "pedagogical value" of the document to the learner. While pedagogical value is an important concept that has been studied extensively within the education domain, there has been little work exploring it from a computational, i.e., natural language processing (NLP), perspective. To allow a computational exploration of this concept, we introduce the notion of "pedagogical roles" of documents (e.g., Tutorial and Survey) as an intermediary component for the study of pedagogical value. Given the lack of available corpora for our exploration, we create the first annotated corpus of pedagogical roles and use it to test baseline techniques for automatic prediction of such roles.