CL IRJul 14, 2014

Toward Network-based Keyword Extraction from Multitopic Web Documents

Sabina Šišović, Sanda Martinčić-Ipšić, Ana Meštrović

arXiv:1407.3636v1

Originality Synthesis-oriented

AI Analysis

This work addresses keyword extraction for web documents, but it appears incremental as it builds on existing network-based methods with a new combination of measures.

The paper tackled keyword extraction from multitopic web documents by representing texts as co-occurrence networks and testing centrality measures, achieving promising results with the selectivity measure and proposing an approach for extracting word pairs based on in/out selectivity and weight measures combined with filtering.

In this paper we analyse the selectivity measure calculated from the complex network in the task of the automatic keyword extraction. Texts, collected from different web sources (portals, forums), are represented as directed and weighted co-occurrence complex networks of words. Words are nodes and links are established between two nodes if they are directly co-occurring within the sentence. We test different centrality measures for ranking nodes - keyword candidates. The promising results are achieved using the selectivity measure. Then we propose an approach which enables extracting word pairs according to the values of the in/out selectivity and weight measures combined with filtering.

View on arXiv PDF

Similar