A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction. (2016)
- Record Type:
- Journal Article
- Title:
- A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction. (2016)
- Main Title:
- A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction
- Authors:
- Kumar, Niraj
Srinathan, Kannan
Varma, Vasudeva - Abstract:
- In this paper, we present a novel N-gram (N> = 1) filtration technique for keyphrase extraction. To filter the sophisticated candidate keyphrases (N-grams), we introduce the combined use of: 1) statistical feature (obtained by using weighted betweenness centrality scores of words, which is generally used to identify the border nodes/edges in community detection techniques); 2) co-location strength (calculated by using nearest neighbour Dbpedia texts). We also introduce the use of N-gram (N> = 1) graph, which reduces the bias effect of lower length N-grams in the ranking process and preserves the semantics of words (phraseness), based upon local context. To capture the theme of the document and to reduce the effect of noisy terms in the ranking process, we apply an information theoretic framework for key-player detection on the proposed N-gram graph. Our experimental results show that the devised system performs better than the current state-of-the-art unsupervised systems and comparable/better than supervised systems.
- Is Part Of:
- International journal of data mining, modelling and management. Volume 8:Number 2(2016)
- Journal:
- International journal of data mining, modelling and management
- Issue:
- Volume 8:Number 2(2016)
- Issue Display:
- Volume 8, Issue 2 (2016)
- Year:
- 2016
- Volume:
- 8
- Issue:
- 2
- Issue Sort Value:
- 2016-0008-0002-0000
- Page Start:
- 124
- Page End:
- 143
- Publication Date:
- 2016
- Subjects:
- keyphrase extraction -- weighted betweenness centrality -- N-gram graph -- normalised pointwise mutual information -- NPMI -- key phrases -- N-gram filtration -- statistical features -- co-location -- semantics -- document themes -- information theory
Data mining -- Periodicals
Information science -- Periodicals
Databases -- Periodicals
005.7 - Journal URLs:
- http://www.inderscience.com/jhome.php?jcode=ijdmmm ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1759-1163
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 7813.xml