Representing and extracting lung cancer study metadata: Study objective and study design. (1st March 2015)
- Record Type:
- Journal Article
- Title:
- Representing and extracting lung cancer study metadata: Study objective and study design. (1st March 2015)
- Main Title:
- Representing and extracting lung cancer study metadata: Study objective and study design
- Authors:
- Garcia-Gathright, Jean I.
Oh, Andrea
Abarca, Phillip A.
Han, Mary
Sago, William
Spiegel, Marshall L.
Wolf, Brian
Garon, Edward B.
Bui, Alex A.T.
Aberle, Denise R. - Abstract:
- Abstract: This paper describes the information retrieval step in Casama (Contextualized Semantic Maps), a project that summarizes and contextualizes current research papers on driver mutations in non-small cell lung cancer. Casama׳s representation of lung cancer studies aims to capture elements that will assist an end-user in retrieving studies and, importantly, judging their strength. This paper focuses on two types of study metadata: study objective and study design. 430 abstracts on EGFR and ALK mutations in lung cancer were annotated manually. Casama׳s support vector machine (SVM) automatically classified the abstracts by study objective with as much as 129% higher F -scores compared to PubMed׳s built-in filters. A second SVM classified the abstracts by epidemiological study design, suggesting strength of evidence at a more granular level than in previous work. The classification results and the top features determined by the classifiers suggest that this scheme would be generalizable to other mutations in lung cancer, as well as studies on driver mutations in other cancer domains. Abstract : Highlights: We propose to improve retrieval by representing and extracting study metadata. Multiple expert readers produced a gold standard of 430 abstracts on lung cancer. Automatic classification performed better than or comparable to PubMed׳s filters. Study design classification was robust to differences in vocabulary across corpora. Top-ranked features were not domain-specificAbstract: This paper describes the information retrieval step in Casama (Contextualized Semantic Maps), a project that summarizes and contextualizes current research papers on driver mutations in non-small cell lung cancer. Casama׳s representation of lung cancer studies aims to capture elements that will assist an end-user in retrieving studies and, importantly, judging their strength. This paper focuses on two types of study metadata: study objective and study design. 430 abstracts on EGFR and ALK mutations in lung cancer were annotated manually. Casama׳s support vector machine (SVM) automatically classified the abstracts by study objective with as much as 129% higher F -scores compared to PubMed׳s built-in filters. A second SVM classified the abstracts by epidemiological study design, suggesting strength of evidence at a more granular level than in previous work. The classification results and the top features determined by the classifiers suggest that this scheme would be generalizable to other mutations in lung cancer, as well as studies on driver mutations in other cancer domains. Abstract : Highlights: We propose to improve retrieval by representing and extracting study metadata. Multiple expert readers produced a gold standard of 430 abstracts on lung cancer. Automatic classification performed better than or comparable to PubMed׳s filters. Study design classification was robust to differences in vocabulary across corpora. Top-ranked features were not domain-specific and could generalize to other domains. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 58(2015)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 58(2015)
- Issue Display:
- Volume 58, Issue 2015 (2015)
- Year:
- 2015
- Volume:
- 58
- Issue:
- 2015
- Issue Sort Value:
- 2015-0058-2015-0000
- Page Start:
- 63
- Page End:
- 72
- Publication Date:
- 2015-03-01
- Subjects:
- Automatic summarization -- Quality of evidence -- Information retrieval
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2015.01.004 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14531.xml