A quantitative analysis of the temporal effects on automatic text classification. (7th August 2015)
- Record Type:
- Journal Article
- Title:
- A quantitative analysis of the temporal effects on automatic text classification. (7th August 2015)
- Main Title:
- A quantitative analysis of the temporal effects on automatic text classification
- Authors:
- Salles, Thiago
Rocha, Leonardo
Gonçalves, Marcos André
Almeida, Jussara M.
Mourão, Fernando
Meira, Wagner
Viegas, Felipe - Abstract:
- Abstract : Automatic text classification (TC) continues to be a relevant research topic and several TC algorithms have been proposed. However, the majority of TC algorithms assume that the underlying data distribution does not change over time. In this work, we are concerned with the challenges imposed by the temporal dynamics observed in textual data sets. We provide evidence of the existence of temporal effects in three textual data sets, reflected by variations observed over time in the class distribution, in the pairwise class similarities, and in the relationships between terms and classes. We then quantify, using a series of full factorial design experiments, the impact of these effects on four well‐known TC algorithms. We show that these temporal effects affect each analyzed data set differently and that they restrict the performance of each considered TC algorithm to different extents. The reported quantitative analyses, which are the original contributions of this article, provide valuable new insights to better understand the behavior of TC algorithms when faced with nonstatic (temporal) data distributions and highlight important requirements for the proposal of more accurate classification models.
- Is Part Of:
- Journal of the Association for Information Science and Technology. Volume 67:Number 7(2016:Jul.)
- Journal:
- Journal of the Association for Information Science and Technology
- Issue:
- Volume 67:Number 7(2016:Jul.)
- Issue Display:
- Volume 67, Issue 7 (2016)
- Year:
- 2016
- Volume:
- 67
- Issue:
- 7
- Issue Sort Value:
- 2016-0067-0007-0000
- Page Start:
- 1639
- Page End:
- 1667
- Publication Date:
- 2015-08-07
- Subjects:
- classification
Information science -- Periodicals
Information technology -- Periodicals
020.5 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/%28ISSN%292330-1643 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/asi.23452 ↗
- Languages:
- English
- ISSNs:
- 2330-1635
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4704.325000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2505.xml