Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition. (17th July 2021)

Record Type:: Journal Article
Title:: Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition. (17th July 2021)
Main Title:: Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition
Authors:: Li, Jianfu
Zhou, Yujia
Jiang, Xiaoqian
Natarajan, Karthik
Pakhomov, Serguei Vs
Liu, Hongfang
Xu, Hua
Abstract:: Abstract: Objective: : Developing clinical natural language processing systems often requires access to many clinical documents, which are not widely available to the public due to privacy and security concerns. To address this challenge, we propose to develop methods to generate synthetic clinical notes and evaluate their utility in real clinical natural language processing tasks. Materials and Methods: : We implemented 4 state-of-the-art text generation models, namely CharRNN, SegGAN, GPT-2, and CTRL, to generate clinical text for the History and Present Illness section. We then manually annotated clinical entities for randomly selected 500 History and Present Illness notes generated from the best-performing algorithm. To compare the utility of natural and synthetic corpora, we trained named entity recognition (NER) models from all 3 corpora and evaluated their performance on 2 independent natural corpora. Results: : Our evaluation shows GPT-2 achieved the best BLEU (bilingual evaluation understudy) score (with a BLEU-2 of 0.92). NER models trained on synthetic corpus generated by GPT-2 showed slightly better performance on 2 independent corpora: strict F1 scores of 0.709 and 0.748, respectively, when compared with the NER models trained on natural corpus (F1 scores of 0.706 and 0.737, respectively), indicating the good utility of synthetic corpora in clinical NER model development. In addition, we also demonstrated that an augmented method that combines both natural and … (more)
Is Part Of:: Journal of the American Medical Informatics Association. Volume 28:Number 10(2021)
Journal:: Journal of the American Medical Informatics Association
Issue:: Volume 28:Number 10(2021)
Issue Display:: Volume 28, Issue 10 (2021)
Year:: 2021
Volume:: 28
Issue:: 10
Issue Sort Value:: 2021-0028-0010-0000
Page Start:: 2193
Page End:: 2201
Publication Date:: 2021-07-17
Subjects:: natural language processing -- neural language model -- text generation -- clinical notes -- named entity recognition
Medical informatics -- Periodicals
Information Services -- Periodicals
Medical Informatics -- Periodicals
Médecine -- Informatique -- Périodiques
Informatica
Geneeskunde
Informatique médicale
Computer network resources
Electronic journals
610.285
Journal URLs:: http://jamia.bmj.com/ ↗
http://www.jamia.org ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=76 ↗
http://www.sciencedirect.com/science/journal/10675027 ↗
http://jamia.oxfordjournals.org/ ↗
http://www.oxfordjournals.org/en/ ↗
DOI:: 10.1093/jamia/ocab112 ↗
Languages:: English
ISSNs:: 1067-5027
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 4689.025000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store
Ingest File:: 19026.xml