MOOCdb: Developing Standards and Systems to Support MOOC Data Science
This work addresses the problem of inconsistent data formats for researchers and developers in online education, facilitating data science progress in MOOCs, though it is incremental as it builds on existing concepts of data modeling.
The authors tackled the lack of standardized data models for MOOC data science by developing MOOCdb, a platform-agnostic shared data model that captures student interactions (observing, submitting, collaborating, and giving feedback) and maps data from Coursera and edX, enabling collaborative frameworks without sharing raw data.
We present a shared data model for enabling data science in Massive Open Online Courses (MOOCs). The model captures students interactions with the online platform. The data model is platform agnostic and is based on some basic core actions that students take on an online learning platform. Students usually interact with the platform in four different modes: Observing, Submitting, Collaborating and giving feedback. In observing mode students are simply browsing the online platform, watching videos, reading material, reading book or watching forums. In submitting mode, students submit information to the platform. This includes submissions towards quizzes, homeworks, or any assessment modules. In collaborating mode students interact with other students or instructors on forums, collaboratively editing wiki or chatting on google hangout or other hangout venues. With this basic definitions of activities, and a data model to store events pertaining to these activities, we then create a common terminology to map Coursera and edX data into this shared data model. This shared data model called MOOCdb becomes the foundation for a number of collaborative frameworks that enable progress in data science without the need to share the data.