Induced lexico-syntactic patterns improve information extraction from online medical forums. (26th June 2014)
- Record Type:
- Journal Article
- Title:
- Induced lexico-syntactic patterns improve information extraction from online medical forums. (26th June 2014)
- Main Title:
- Induced lexico-syntactic patterns improve information extraction from online medical forums
- Authors:
- Gupta, Sonal
MacLean, Diana L
Heer, Jeffrey
Manning, Christopher D - Abstract:
- Abstract: Objective To reliably extract two entity types, symptoms and conditions (SCs), and drugs and treatments (DTs), from patient-authored text (PAT) by learning lexico-syntactic patterns from data annotated with seed dictionaries. Background and significance Despite the increasing quantity of PAT (eg, online discussion threads), tools for identifying medical entities in PAT are limited. When applied to PAT, existing tools either fail to identify specific entity types or perform poorly. Identification of SC and DT terms in PAT would enable exploration of efficacy and side effects for not only pharmaceutical drugs, but also for home remedies and components of daily care. Materials and methods We use SC and DT term dictionaries compiled from online sources to label several discussion forums from MedHelp (http://www.medhelp.org ). We then iteratively induce lexico-syntactic patterns corresponding strongly to each entity type to extract new SC and DT terms. Results Our system is able to extract symptom descriptions and treatments absent from our original dictionaries, such as 'LADA', 'stabbing pain', and 'cinnamon pills'. Our system extracts DT terms with 58–70% F1 score and SC terms with 66–76% F1 score on two forums from MedHelp. We show improvements over MetaMap, OBA, a conditional random field-based classifier, and a previous pattern learning approach. Conclusions Our entity extractor based on lexico-syntactic patterns is a successful and preferable technique forAbstract: Objective To reliably extract two entity types, symptoms and conditions (SCs), and drugs and treatments (DTs), from patient-authored text (PAT) by learning lexico-syntactic patterns from data annotated with seed dictionaries. Background and significance Despite the increasing quantity of PAT (eg, online discussion threads), tools for identifying medical entities in PAT are limited. When applied to PAT, existing tools either fail to identify specific entity types or perform poorly. Identification of SC and DT terms in PAT would enable exploration of efficacy and side effects for not only pharmaceutical drugs, but also for home remedies and components of daily care. Materials and methods We use SC and DT term dictionaries compiled from online sources to label several discussion forums from MedHelp (http://www.medhelp.org ). We then iteratively induce lexico-syntactic patterns corresponding strongly to each entity type to extract new SC and DT terms. Results Our system is able to extract symptom descriptions and treatments absent from our original dictionaries, such as 'LADA', 'stabbing pain', and 'cinnamon pills'. Our system extracts DT terms with 58–70% F1 score and SC terms with 66–76% F1 score on two forums from MedHelp. We show improvements over MetaMap, OBA, a conditional random field-based classifier, and a previous pattern learning approach. Conclusions Our entity extractor based on lexico-syntactic patterns is a successful and preferable technique for identifying specific entity types in PAT. To the best of our knowledge, this is the first paper to extract SC and DT entities from PAT. We exhibit learning of informal terms often used in PAT but missing from typical dictionaries. … (more)
- Is Part Of:
- Journal of the American Medical Informatics Association. Volume 21:Number 5(2014:Sep.)
- Journal:
- Journal of the American Medical Informatics Association
- Issue:
- Volume 21:Number 5(2014:Sep.)
- Issue Display:
- Volume 21, Issue 5 (2014)
- Year:
- 2014
- Volume:
- 21
- Issue:
- 5
- Issue Sort Value:
- 2014-0021-0005-0000
- Page Start:
- 902
- Page End:
- 909
- Publication Date:
- 2014-06-26
- Subjects:
- text mining -- online health forums -- medical entites extraction -- natural language processing
Medical informatics -- Periodicals
Information Services -- Periodicals
Medical Informatics -- Periodicals
Médecine -- Informatique -- Périodiques
Informatica
Geneeskunde
Informatique médicale
Computer network resources
Electronic journals
610.285 - Journal URLs:
- http://jamia.bmj.com/ ↗
http://www.jamia.org ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=76 ↗
http://www.sciencedirect.com/science/journal/10675027 ↗
http://jamia.oxfordjournals.org/ ↗
http://www.oxfordjournals.org/en/ ↗ - DOI:
- 10.1136/amiajnl-2014-002669 ↗
- Languages:
- English
- ISSNs:
- 1067-5027
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4689.025000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 17588.xml