The utility of web mining for epidemiological research: studying the association between parity and cancer risk. (27th November 2015)
- Record Type:
- Journal Article
- Title:
- The utility of web mining for epidemiological research: studying the association between parity and cancer risk. (27th November 2015)
- Main Title:
- The utility of web mining for epidemiological research: studying the association between parity and cancer risk
- Authors:
- Tourassi, Georgia
Yoon, Hong-Jun
Xu, Songhua
Han, Xuesong - Abstract:
- Abstract : Background The World Wide Web has emerged as a powerful data source for epidemiological studies related to infectious disease surveillance. However, its potential for cancer-related epidemiological discoveries is largely unexplored. Methods Using advanced web crawling and tailored information extraction procedures, the authors automatically collected and analyzed the text content of 79 394 online obituary articles published between 1998 and 2014. The collected data included 51 911 cancer (27 330 breast; 9470 lung; 6496 pancreatic; 6342 ovarian; 2273 colon) and 27 483 non-cancer cases. With the derived information, the authors replicated a case-control study design to investigate the association between parity (i.e., childbearing) and cancer risk. Age-adjusted odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for each cancer type and compared to those reported in large-scale epidemiological studies. Results Parity was found to be associated with a significantly reduced risk of breast cancer (OR = 0.78, 95% CI, 0.75-0.82), pancreatic cancer (OR = 0.78, 95% CI, 0.72-0.83), colon cancer (OR = 0.67, 95% CI, 0.60-0.74), and ovarian cancer (OR = 0.58, 95% CI, 0.54-0.62). Marginal association was found for lung cancer risk (OR = 0.87, 95% CI, 0.81-0.92). The linear trend between increased parity and reduced cancer risk was dramatically more pronounced for breast and ovarian cancer than the other cancers included in the analysis. Conclusion This largeAbstract : Background The World Wide Web has emerged as a powerful data source for epidemiological studies related to infectious disease surveillance. However, its potential for cancer-related epidemiological discoveries is largely unexplored. Methods Using advanced web crawling and tailored information extraction procedures, the authors automatically collected and analyzed the text content of 79 394 online obituary articles published between 1998 and 2014. The collected data included 51 911 cancer (27 330 breast; 9470 lung; 6496 pancreatic; 6342 ovarian; 2273 colon) and 27 483 non-cancer cases. With the derived information, the authors replicated a case-control study design to investigate the association between parity (i.e., childbearing) and cancer risk. Age-adjusted odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for each cancer type and compared to those reported in large-scale epidemiological studies. Results Parity was found to be associated with a significantly reduced risk of breast cancer (OR = 0.78, 95% CI, 0.75-0.82), pancreatic cancer (OR = 0.78, 95% CI, 0.72-0.83), colon cancer (OR = 0.67, 95% CI, 0.60-0.74), and ovarian cancer (OR = 0.58, 95% CI, 0.54-0.62). Marginal association was found for lung cancer risk (OR = 0.87, 95% CI, 0.81-0.92). The linear trend between increased parity and reduced cancer risk was dramatically more pronounced for breast and ovarian cancer than the other cancers included in the analysis. Conclusion This large web-mining study on parity and cancer risk produced findings very similar to those reported with traditional observational studies. It may be used as a promising strategy to generate study hypotheses for guiding and prioritizing future epidemiological studies. … (more)
- Is Part Of:
- Journal of the American Medical Informatics Association. Volume 23:Number 3(2016:May)
- Journal:
- Journal of the American Medical Informatics Association
- Issue:
- Volume 23:Number 3(2016:May)
- Issue Display:
- Volume 23, Issue 3 (2016)
- Year:
- 2016
- Volume:
- 23
- Issue:
- 3
- Issue Sort Value:
- 2016-0023-0003-0000
- Page Start:
- 588
- Page End:
- 595
- Publication Date:
- 2015-11-27
- Subjects:
- digital epidemiology -- web mining -- cancer risk -- parity
Medical informatics -- Periodicals
Information Services -- Periodicals
Medical Informatics -- Periodicals
Médecine -- Informatique -- Périodiques
Informatica
Geneeskunde
Informatique médicale
Computer network resources
Electronic journals
610.285 - Journal URLs:
- http://jamia.bmj.com/ ↗
http://www.jamia.org ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=76 ↗
http://www.sciencedirect.com/science/journal/10675027 ↗
http://jamia.oxfordjournals.org/ ↗
http://www.oxfordjournals.org/en/ ↗ - DOI:
- 10.1093/jamia/ocv141 ↗
- Languages:
- English
- ISSNs:
- 1067-5027
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4689.025000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 15654.xml