LGOct 1, 2021

Cluster Analysis on Jester Dataset: A Review

arXiv:2110.02740v11.6

Originality Synthesis-oriented

AI Analysis

This is an incremental review that addresses data preparation issues for unsupervised learning in a specific dataset, relevant to researchers working with incomplete joke rating data.

The paper reviews and validates the only existing work on performing cluster analysis on the Jester dataset, which contains missing joke ratings, by addressing data preparation challenges and suggesting corrections and future improvements.

Unsupervised Machine Learning Paradigms are often the only methodology to rely on, given a Pattern Recognition Task with no target label or annotations being present. In such scenarios, data preparation is a crucial step to be performed so that the Unsupervised Paradigms work with as much perfection as possible. But, when there is no sufficient or missing data being present in each and every instance of a dataset, data preparation becomes a challenge itself. One such case-study is the Jester Dataset that has missing values which are basically ratings given by Joke-Readers to a specified set of 100 jokes. In order to perform a Cluster Analysis on such a dataset, the data preparation step should involve filling the missing ratings with appropriate values followed by cluster analysis using an Unsupervised ML Paradigm. In this study, the most recent and probably the only work that involves Cluster Analysis on the Jester Dataset of Jokes is reviewed and validated with corrections and future scope.

View on arXiv PDF

Similar