How learners produce data from text in classifying clickbait
This work addresses the problem of understanding student learning in data science education, specifically for text classification tasks, but it is incremental as it focuses on observational insights rather than novel methods or broad impacts.
The study investigated how undergraduate students reason about text as data when classifying headlines as clickbait or news, finding that a task-based interview method engaged participants in thinking at both human-perception and computer-extraction levels, with three types of features (function, content, and form) emerging from the activities.
Text provides a compelling example of unstructured data that can be used to motivate and explore classification problems. Challenges arise regarding the representation of features of text and student linkage between text representations as character strings and identification of features that embed connections with underlying phenomena. In order to observe how students reason with text data in scenarios designed to elicit certain aspects of the domain, we employed a task-based interview method using a structured protocol with six pairs of undergraduate students. Our goal was to shed light on students' understanding of text as data using a motivating task to classify headlines as "clickbait" or "news". Three types of features (function, content, and form) surfaced, the majority from the first scenario. Our analysis of the interviews indicates that this sequence of activities engaged the participants in thinking at both the human-perception level and the computer-extraction level and conceptualizing connections between them.