Pattern Dependence Detection using n-TARP Clustering
This work addresses the challenge of analyzing high-dimensional data with limited subjects, particularly in educational contexts, though it is incremental as it builds on existing dependency detection methods.
The paper tackles the problem of detecting dependencies between high-dimensional observed variables and one-dimensional outcomes in small-sample settings, such as educational data, by proposing a method to quantify and validate pattern dependencies, and demonstrates its application by finding valid dependencies between student skills and grades in a signal processing class.
Consider an experiment involving a potentially small number of subjects. Some random variables are observed on each subject: a high-dimensional one called the "observed" random variable, and a one-dimensional one called the "outcome" random variable. We are interested in the dependencies between the observed random variable and the outcome random variable. We propose a method to quantify and validate the dependencies of the outcome random variable on the various patterns contained in the observed random variable. Different degrees of relationship are explored (linear, quadratic, cubic, ...). This work is motivated by the need to analyze educational data, which often involves high-dimensional data representing a small number of students. Thus our implementation is designed for a small number of subjects; however, it can be easily modified to handle a very large dataset. As an illustration, the proposed method is used to study the influence of certain skills on the course grade of students in a signal processing class. A valid dependency of the grade on the different skill patterns is observed in the data.