IRLGJun 6, 2014

Machine learning approach for text and document mining

arXiv:1406.1580v1266 citations
Originality Synthesis-oriented
AI Analysis

This work addresses text classification for researchers and developers, but it appears incremental as it applies an existing method without new data or major innovations.

The paper tackles text categorization by using a KNN-based machine learning approach to classify documents and return the most relevant ones, but it does not provide concrete numerical results.

Text Categorization (TC), also known as Text Classification, is the task of automatically classifying a set of text documents into different categories from a predefined set. If a document belongs to exactly one of the categories, it is a single-label classification task; otherwise, it is a multi-label classification task. TC uses several tools from Information Retrieval (IR) and Machine Learning (ML) and has received much attention in the last years from both researchers in the academia and industry developers. In this paper, we first categorize the documents using KNN based machine learning approach and then return the most relevant documents.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes