Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes. Issue 1 (3rd January 2019)
- Record Type:
- Journal Article
- Title:
- Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes. Issue 1 (3rd January 2019)
- Main Title:
- Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes
- Authors:
- Guan, Meijian
Cho, Samuel
Petro, Robin
Zhang, Wei
Pasche, Boris
Topaloglu, Umit - Abstract:
- Abstract: Objectives: Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients. Methods: We obtained 5889 deidentified progress reports (2439 words on average) for 755 cancer patients who have undergone a clinical next generation sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit, long short-term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to 5 machine learning algorithms including Naive Bayes, K-nearest Neighbor, Support Vector Machine for classification, Random forest, and Logistic Regression. Results: Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pretrained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%. Discussion and Conclusion: NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classificationAbstract: Objectives: Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients. Methods: We obtained 5889 deidentified progress reports (2439 words on average) for 755 cancer patients who have undergone a clinical next generation sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit, long short-term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to 5 machine learning algorithms including Naive Bayes, K-nearest Neighbor, Support Vector Machine for classification, Random forest, and Logistic Regression. Results: Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pretrained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%. Discussion and Conclusion: NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classification tasks for unstructured clinical progress notes. … (more)
- Is Part Of:
- JAMIA open. Volume 2:Issue 1(2019)
- Journal:
- JAMIA open
- Issue:
- Volume 2:Issue 1(2019)
- Issue Display:
- Volume 2, Issue 1 (2019)
- Year:
- 2019
- Volume:
- 2
- Issue:
- 1
- Issue Sort Value:
- 2019-0002-0001-0000
- Page Start:
- 139
- Page End:
- 149
- Publication Date:
- 2019-01-03
- Subjects:
- machine learning -- natural language processing -- electronic health records -- cancer -- genomics
Medical informatics -- Periodicals
610.285 - Journal URLs:
- http://www.oxfordjournals.org/ ↗
https://academic.oup.com/jamiaopen ↗ - DOI:
- 10.1093/jamiaopen/ooy061 ↗
- Languages:
- English
- ISSNs:
- 2574-2531
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12004.xml