Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. (9th March 2015)
- Record Type:
- Journal Article
- Title:
- Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. (9th March 2015)
- Main Title:
- Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features
- Authors:
- Nikfarjam, Azadeh
Sarker, Abeed
O'Connor, Karen
Ginn, Rachel
Gonzalez, Graciela - Abstract:
- Abstract: Objective Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. Methods We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. Results ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F -measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. Conclusion It is possible to extract complex medicalAbstract: Objective Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. Methods We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. Results ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F -measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. Conclusion It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets. … (more)
- Is Part Of:
- Journal of the American Medical Informatics Association. Volume 22:Number 3(2015:May)
- Journal:
- Journal of the American Medical Informatics Association
- Issue:
- Volume 22:Number 3(2015:May)
- Issue Display:
- Volume 22, Issue 3 (2015)
- Year:
- 2015
- Volume:
- 22
- Issue:
- 3
- Issue Sort Value:
- 2015-0022-0003-0000
- Page Start:
- 671
- Page End:
- 681
- Publication Date:
- 2015-03-09
- Subjects:
- adverse drug reaction -- ADR -- social media mining -- pharmacovigilance -- natural language processing -- machine learning -- deep learning word embeddings
Medical informatics -- Periodicals
Information Services -- Periodicals
Medical Informatics -- Periodicals
Médecine -- Informatique -- Périodiques
Informatica
Geneeskunde
Informatique médicale
Computer network resources
Electronic journals
610.285 - Journal URLs:
- http://jamia.bmj.com/ ↗
http://www.jamia.org ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=76 ↗
http://www.sciencedirect.com/science/journal/10675027 ↗
http://jamia.oxfordjournals.org/ ↗
http://www.oxfordjournals.org/en/ ↗ - DOI:
- 10.1093/jamia/ocu041 ↗
- Languages:
- English
- ISSNs:
- 1067-5027
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4689.025000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 15138.xml