Evaluation of an automated Presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting. (December 2022)
- Record Type:
- Journal Article
- Title:
- Evaluation of an automated Presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting. (December 2022)
- Main Title:
- Evaluation of an automated Presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting
- Authors:
- Kotevski, Damian P.
Smee, Robert I.
Field, Matthew
Nemes, Yvonne N.
Broadley, Kathryn
Vajdic, Claire M. - Abstract:
- Highlights: Electronic radiation oncology records in an Australian setting were anonymised. 13 personally identifiable entities were anonymised using Microsoft Presidio. Presidio scored a strict and relaxed F1-score of 0.8471 and 0.8980, respectively. Presidio can be utilised for safe use and sharing of certain cancer data within Australia. Abstract: Background: Electronic medical records (EMRs) contain valuable information for clinical research, however, the presence of personally identifying information (PII) restricts their use. Anonymisation of PII from EMRs enables clinical information to be shared for research purposes. Since there is limited research relating to the anonymisation of Australian EMRs, the performance of Microsoft Presidio with customisation on clinical documents from an Australian radiation oncology information system (OIS) was evaluated. Methods: A random sample of 300 unstructured free-text clinical documents were extracted from the Prince of Wales Cancer Centre OIS on patients diagnosed with cancer of the head and neck between 2000 and 2017. Anonymisation of clinical text was performed using Microsoft Presidio, implemented in Python programming language. Each clinical document was manually compared pre- and post-anonymisation for the identification and redaction of 13 PII. Model performance was evaluated using three classification criteria; correct, partial, and missed classification, to determine recall, precision, and F1-score. These three metricsHighlights: Electronic radiation oncology records in an Australian setting were anonymised. 13 personally identifiable entities were anonymised using Microsoft Presidio. Presidio scored a strict and relaxed F1-score of 0.8471 and 0.8980, respectively. Presidio can be utilised for safe use and sharing of certain cancer data within Australia. Abstract: Background: Electronic medical records (EMRs) contain valuable information for clinical research, however, the presence of personally identifying information (PII) restricts their use. Anonymisation of PII from EMRs enables clinical information to be shared for research purposes. Since there is limited research relating to the anonymisation of Australian EMRs, the performance of Microsoft Presidio with customisation on clinical documents from an Australian radiation oncology information system (OIS) was evaluated. Methods: A random sample of 300 unstructured free-text clinical documents were extracted from the Prince of Wales Cancer Centre OIS on patients diagnosed with cancer of the head and neck between 2000 and 2017. Anonymisation of clinical text was performed using Microsoft Presidio, implemented in Python programming language. Each clinical document was manually compared pre- and post-anonymisation for the identification and redaction of 13 PII. Model performance was evaluated using three classification criteria; correct, partial, and missed classification, to determine recall, precision, and F1-score. These three metrics were performed under relaxed conditions, where partial classifications were considered correct, and under strict conditions, where only correct classifications were considered correct. Results: A total of 8, 713 PII were identified, of which 7, 026 (81%) were classified as correct, 850 (10%) as partial, and 837 (9%) as missed. There were 245 instances of incorrect classifications. Evaluation of the model demonstrated an average precision of 0.8921, recall (strict) of 0.8064, F1-score (strict) of 0.8471, recall (relaxed) of 0.9039, and F1-score (relaxed) of 0.8980. Conclusion: This is the first example of an open-source anonymisation model to be customised and tested on clinical documents from an Australian radiation oncology EMR. These findings support the use of Presidio for the safe use and sharing of cancer data within Australia for certain PII, however, additional checks are required to ensure person names are successfully anonymised. … (more)
- Is Part Of:
- International journal of medical informatics. Volume 168(2022)
- Journal:
- International journal of medical informatics
- Issue:
- Volume 168(2022)
- Issue Display:
- Volume 168, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 168
- Issue:
- 2022
- Issue Sort Value:
- 2022-0168-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-12
- Subjects:
- Electronic medical records -- Oncology information systems -- Personally identifiable information -- Data sharing -- Anonymisation -- Microsoft Presidio
Medical informatics -- Periodicals
Information science -- Periodicals
Computers -- Periodicals
Medical technology -- Periodicals
Medical Informatics -- Periodicals
Technology, Medical -- Periodicals
Computers
Information science
Medical informatics
Medical technology
Electronic journals
Periodicals
Electronic journals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/13865056 ↗
http://www.clinicalkey.com/dura/browse/journalIssue/13865056 ↗
http://www.clinicalkey.com.au/dura/browse/journalIssue/13865056 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ijmedinf.2022.104880 ↗
- Languages:
- English
- ISSNs:
- 1386-5056
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.345250
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24220.xml