LG MLDec 18, 2019

Comparison of Classification Methods for Very High-Dimensional Data in Sparse Random Projection Representation

arXiv:1912.08616v11.82 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of handling big data with millions of features for machine learning practitioners, but it is incremental as it compares existing methods rather than introducing a fundamentally new approach.

The paper tackled the problem of classifying very high-dimensional sparse data by comparing non-iterative and iterative methods, finding that non-iterative methods achieved larger and more accurate models in various scenarios, with evaluations on datasets involving millions of samples and features.

The big data trend has inspired feature-driven learning tasks, which cannot be handled by conventional machine learning models. Unstructured data produces very large binary matrices with millions of columns when converted to vector form. However, such data is often sparse, and hence can be manageable through the use of sparse random projections. This work studies efficient non-iterative and iterative methods suitable for such data, evaluating the results on two representative machine learning tasks with millions of samples and features. An efficient Jaccard kernel is introduced as an alternative to the sparse random projection. Findings indicate that non-iterative methods can find larger, more accurate models than iterative methods in different application scenarios.

View on arXiv PDF

Similar