IRDSNov 29, 2018

Incremental Sparse TFIDF & Incremental Similarity with Bipartite Graphs

arXiv:1811.11746v1
Originality Synthesis-oriented
AI Analysis

This work addresses incremental text stream analysis for researchers or practitioners, but it appears incremental as it builds on existing concepts without clear novelty.

The authors tackled the problem of analyzing text streams by implementing Incremental Sparse TF-IDF and Incremental Cosine Similarity using bipartite graphs to efficiently update document-word relationships, but no concrete results or numbers are reported.

In this report, we experimented with several concepts regarding text streams analysis. We tested an implementation of Incremental Sparse TF-IDF (IS-TFIDF) and Incremental Cosine Similarity (ICS) with the use of bipartite graphs. We are using bipartite graphs - one type of node are documents, and the other type of nodes are words - to know what documents are affected with a word arrival at the stream (the neighbors of the word in the graph). Thus, with this information, we leverage optimized algorithms used for graph-based applications. The concept is similar to, for example, the use of hash tables or other computer science concepts used for fast access to information in memory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes