IRSep 28, 2021

Explainable Point-Based Document Visualizations

Primož Godec, Nikola Ðukić, Ajda Pretnar, Vesna Tanko, Lan Žagar, Blaž Zupan

arXiv:2110.00462v12.0

Originality Incremental advance

AI Analysis

This work addresses the interpretability issue in document visualization for researchers and analysts, but it is incremental as it applies existing keyword extraction methods to a known bottleneck.

The paper tackled the problem of interpreting point-based document visualizations by proposing to label clusters using keyword extraction methods, finding that YAKE! and TF-IDF outperformed other techniques on a longevity article dataset.

Two-dimensional data maps can visually reveal information about the relations between data instances. Popular techniques to construct data maps are t-SNE and UMAP. The resulting point-based visualizations, though, provide information only through their interpretation. We here consider a set of abstracts from the articles on longevity to argue for using keyword extraction methods to label clusters of documents in the map. Among the considered approaches, the best results were obtained by recently proposed YAKE!. Surprisingly, a classical TF-IDF term ranking outperformed graph and embedding-based techniques.

View on arXiv PDF

Similar