Probability-based text clustering algorithm by alternately repeating two operations. (June 2013)
- Record Type:
- Journal Article
- Title:
- Probability-based text clustering algorithm by alternately repeating two operations. (June 2013)
- Main Title:
- Probability-based text clustering algorithm by alternately repeating two operations
- Authors:
- Liu, Ming
Liu, Yuanchao
Liu, Bingquan
Lin, Lei - Abstract:
- Owing to the rapid advance of internet technology, users have to face to a large amount of raw data from the World Wide Web every day, most of which is displayed in text format. This situation brings a great demand for efficient text analysis techniques by internet users. Since clustering is unsupervised and requires no prior knowledge, it is extensively adopted to help analyse textual data. Unfortunately, as far as I know, almost all the clustering algorithms proposed so far fail to deal with large-scale text collection. For precisely classifying large-scale text collection, a novel probability based text clustering algorithm by alternately repeating two operations (abbreviated as PTCART) is proposed in this paper. This algorithm just repeats two operations of (a) feature set construction and (b) text partition until the optimal partition is reached. Its convergent capacity is also validated. Experiments results demonstrate that, compared with several popular text clustering algorithms, PTCART has excellent performance.
- Is Part Of:
- Journal of information science. Volume 39:Number 3(2013)
- Journal:
- Journal of information science
- Issue:
- Volume 39:Number 3(2013)
- Issue Display:
- Volume 39, Issue 3 (2013)
- Year:
- 2013
- Volume:
- 39
- Issue:
- 3
- Issue Sort Value:
- 2013-0039-0003-0000
- Page Start:
- 372
- Page End:
- 383
- Publication Date:
- 2013-06
- Subjects:
- feature set construction -- filtration of noisy texts -- probability based text clustering -- relation calculation -- text partition
Information science -- Periodicals
Information science
Periodicals
020.5 - Journal URLs:
- http://jis.sagepub.com/archive/ ↗
http://www.ingenta.com/journals/browse/bks/jis?mode=direct ↗
http://www.uk.sagepub.com/home.nav ↗
http://firstsearch.oclc.org ↗
http://firstsearch.oclc.org/journal=0165-5515;screen=info;ECOIP ↗ - DOI:
- 10.1177/0165551512470054 ↗
- Languages:
- English
- ISSNs:
- 0165-5515
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25785.xml