LGFeb 7, 2022

Effects of Parametric and Non-Parametric Methods on High Dimensional Sparse Matrix Representations

Sayali Tambe, Raunak Joshi, Abhishek Gupta, Nandan Kanvinde, Vidya Chitre

arXiv:2202.02894v14.63 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work provides an incremental analysis for researchers and practitioners in machine learning by comparing standard methods on text representations, but does not address a specific real-world problem.

The paper investigates how parametric (Linear Discriminant Analysis, Naive Bayes) and non-parametric (Decision Tree, Support Vector Machines) machine learning methods perform on high-dimensional sparse matrix representations derived from text data using TF-IDF, across dimensions ranging from 50 to 5000, and reports detailed metrics for each algorithm and dimension.

The semantics are derived from textual data that provide representations for Machine Learning algorithms. These representations are interpretable form of high dimensional sparse matrix that are given as an input to the machine learning algorithms. Since learning methods are broadly classified as parametric and non-parametric learning methods, in this paper we provide the effects of these type of algorithms on the high dimensional sparse matrix representations. In order to derive the representations from the text data, we have considered TF-IDF representation with valid reason in the paper. We have formed representations of 50, 100, 500, 1000 and 5000 dimensions respectively over which we have performed classification using Linear Discriminant Analysis and Naive Bayes as parametric learning method, Decision Tree and Support Vector Machines as non-parametric learning method. We have later provided the metrics on every single dimension of the representation and effect of every single algorithm detailed in this paper.

View on arXiv PDF Code

Similar