The Teacher-Student Chatroom Corpus
This provides a new resource for researchers studying language learning and teaching, though it is incremental as it focuses on data collection rather than novel methods.
The paper introduces the Teacher-Student Chatroom Corpus (TSCC), a dataset of over 100 one-to-one online English lessons with 13.5K conversational turns and 133K words, designed to capture interactive and informal language for research use.
The Teacher-Student Chatroom Corpus (TSCC) is a collection of written conversations captured during one-to-one lessons between teachers and learners of English. The lessons took place in an online chatroom and therefore involve more interactive, immediate and informal language than might be found in asynchronous exchanges such as email correspondence. The fact that the lessons were one-to-one means that the teacher was able to focus exclusively on the linguistic abilities and errors of the student, and to offer personalised exercises, scaffolding and correction. The TSCC contains more than one hundred lessons between two teachers and eight students, amounting to 13.5K conversational turns and 133K words: it is freely available for research use. We describe the corpus design, data collection procedure and annotations added to the text. We perform some preliminary descriptive analyses of the data and consider possible uses of the TSCC.