Deep neural networks ensemble for detecting medication mentions in tweets. (27th September 2019)
- Record Type:
- Journal Article
- Title:
- Deep neural networks ensemble for detecting medication mentions in tweets. (27th September 2019)
- Main Title:
- Deep neural networks ensemble for detecting medication mentions in tweets
- Authors:
- Weissenbacher, Davy
Sarker, Abeed
Klein, Ari
O'Connor, Karen
Magge, Arjun
Gonzalez-Hernandez, Graciela - Abstract:
- Abstract: Objective: Twitter posts are now recognized as an important source of patient-generated data, providing unique insights into population health. A fundamental step toward incorporating Twitter data in pharmacoepidemiologic research is to automatically recognize medication mentions in tweets. Given that lexical searches for medication names suffer from low recall due to misspellings or ambiguity with common words, we propose a more advanced method to recognize them. Materials and Methods: We present Kusuri, an Ensemble Learning classifier able to identify tweets mentioning drug products and dietary supplements. Kusuri (薬, "medication" in Japanese) is composed of 2 modules: first, 4 different classifiers (lexicon based, spelling variant based, pattern based, and a weakly trained neural network) are applied in parallel to discover tweets potentially containing medication names; second, an ensemble of deep neural networks encoding morphological, semantic, and long-range dependencies of important words in the tweets makes the final decision. Results: On a class-balanced (50-50) corpus of 15 005 tweets, Kusuri demonstrated performances close to human annotators with an F1 score of 93.7%, the best score achieved thus far on this corpus. On a corpus made of all tweets posted by 112 Twitter users (98 959 tweets, with only 0.26% mentioning medications), Kusuri obtained an F1 score of 78.8%. To the best of our knowledge, Kusuri is the first system to achieve this score on suchAbstract: Objective: Twitter posts are now recognized as an important source of patient-generated data, providing unique insights into population health. A fundamental step toward incorporating Twitter data in pharmacoepidemiologic research is to automatically recognize medication mentions in tweets. Given that lexical searches for medication names suffer from low recall due to misspellings or ambiguity with common words, we propose a more advanced method to recognize them. Materials and Methods: We present Kusuri, an Ensemble Learning classifier able to identify tweets mentioning drug products and dietary supplements. Kusuri (薬, "medication" in Japanese) is composed of 2 modules: first, 4 different classifiers (lexicon based, spelling variant based, pattern based, and a weakly trained neural network) are applied in parallel to discover tweets potentially containing medication names; second, an ensemble of deep neural networks encoding morphological, semantic, and long-range dependencies of important words in the tweets makes the final decision. Results: On a class-balanced (50-50) corpus of 15 005 tweets, Kusuri demonstrated performances close to human annotators with an F1 score of 93.7%, the best score achieved thus far on this corpus. On a corpus made of all tweets posted by 112 Twitter users (98 959 tweets, with only 0.26% mentioning medications), Kusuri obtained an F1 score of 78.8%. To the best of our knowledge, Kusuri is the first system to achieve this score on such an extremely imbalanced dataset. Conclusions: The system identifies tweets mentioning drug names with performance high enough to ensure its usefulness, and is ready to be integrated in pharmacovigilance, toxicovigilance, or more generally, public health pipelines that depend on medication name mentions. … (more)
- Is Part Of:
- Journal of the American Medical Informatics Association. Volume 26:Number 12(2019)
- Journal:
- Journal of the American Medical Informatics Association
- Issue:
- Volume 26:Number 12(2019)
- Issue Display:
- Volume 26, Issue 12 (2019)
- Year:
- 2019
- Volume:
- 26
- Issue:
- 12
- Issue Sort Value:
- 2019-0026-0012-0000
- Page Start:
- 1618
- Page End:
- 1626
- Publication Date:
- 2019-09-27
- Subjects:
- social media -- pharmacovigilance -- drug name detection -- ensemble learning -- text classification
Medical informatics -- Periodicals
Information Services -- Periodicals
Medical Informatics -- Periodicals
Médecine -- Informatique -- Périodiques
Informatica
Geneeskunde
Informatique médicale
Computer network resources
Electronic journals
610.285 - Journal URLs:
- http://jamia.bmj.com/ ↗
http://www.jamia.org ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=76 ↗
http://www.sciencedirect.com/science/journal/10675027 ↗
http://jamia.oxfordjournals.org/ ↗
http://www.oxfordjournals.org/en/ ↗ - DOI:
- 10.1093/jamia/ocz156 ↗
- Languages:
- English
- ISSNs:
- 1067-5027
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4689.025000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 15713.xml