Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years. (1st June 2016)
- Record Type:
- Journal Article
- Title:
- Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years. (1st June 2016)
- Main Title:
- Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years
- Authors:
- Wagner, Mathias
Vicinus, Benjamin
Muthra, Sherieda T.
Richards, Tereza A.
Linder, Roland
Frick, Vilma Oliveira
Groh, Andreas
Rubie, Claudia
Weichert, Frank - Abstract:
- Abstract: Background: The continuous growth of medical sciences literature indicates the need for automated text analysis. Scientific writing which is neither unitary, transcending social situation nor defined by a timeless idea is subject to constant change as it develops in response to evolving knowledge, aims at different goals, and embodies different assumptions about nature and communication. The objective of this study was to evaluate whether publication dates should be considered when performing text mining. Methods: A search of PUBMED for combined references to chemokine identifiers and particular cancer related terms was conducted to detect changes over the past 36 years. Text analyses were performed using freeware available from the World Wide Web. TOEFL Scores of territories hosting institutional affiliations as well as various readability indices were investigated. Further assessment was conducted using Principal Component Analysis. Laboratory examination was performed to evaluate the quality of attempts to extract content from the examined linguistic features. Results: The PUBMED search yielded a total of 14, 420 abstracts (3, 190, 219 words). The range of findings in laboratory experimentation were coherent with the variability of the results described in the analyzed body of literature. Increased concurrence of chemokine identifiers together with cancer related terms was found at the abstract and sentence level, whereas complexity of sentences remained fairlyAbstract: Background: The continuous growth of medical sciences literature indicates the need for automated text analysis. Scientific writing which is neither unitary, transcending social situation nor defined by a timeless idea is subject to constant change as it develops in response to evolving knowledge, aims at different goals, and embodies different assumptions about nature and communication. The objective of this study was to evaluate whether publication dates should be considered when performing text mining. Methods: A search of PUBMED for combined references to chemokine identifiers and particular cancer related terms was conducted to detect changes over the past 36 years. Text analyses were performed using freeware available from the World Wide Web. TOEFL Scores of territories hosting institutional affiliations as well as various readability indices were investigated. Further assessment was conducted using Principal Component Analysis. Laboratory examination was performed to evaluate the quality of attempts to extract content from the examined linguistic features. Results: The PUBMED search yielded a total of 14, 420 abstracts (3, 190, 219 words). The range of findings in laboratory experimentation were coherent with the variability of the results described in the analyzed body of literature. Increased concurrence of chemokine identifiers together with cancer related terms was found at the abstract and sentence level, whereas complexity of sentences remained fairly stable. Conclusions: The findings of the present study indicate that concurrent references to chemokines and cancer increased over time whereas text complexity remained stable. Highlights: This study evaluates whether publication dates should be considered in text mining. Concurrence of chemokine & cancer terms may correspond to expression in tumor cells. Laboratory findings are coherent with variability of results in analyzed literature. Concurrence increased at abstract & sentence level. Sentence complexity is stable. Concurrent references to chemokines and cancer increased over time. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 73(2016)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 73(2016)
- Issue Display:
- Volume 73, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 73
- Issue:
- 2016
- Issue Sort Value:
- 2016-0073-2016-0000
- Page Start:
- 173
- Page End:
- 185
- Publication Date:
- 2016-06-01
- Subjects:
- Nomenclature -- Systems biology
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2016.03.016 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 721.xml