The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight. (7th August 2019)

Record Type:: Journal Article
Title:: The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight. (7th August 2019)
Main Title:: The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight
Authors:: Carrell, David S
Cronkite, David J
Li, Muqun (Rachel)
Nyemba, Steve
Malin, Bradley A
Aberdeen, John S
Hirschman, Lynette
Abstract:: Abstract: Objective: Clinical corpora can be deidentified using a combination of machine-learned automated taggers and hiding in plain sight (HIPS) resynthesis. The latter replaces detected personally identifiable information (PII) with random surrogates, allowing leaked PII to blend in or "hide in plain sight." We evaluated the extent to which a malicious attacker could expose leaked PII in such a corpus. Materials and Methods: We modeled a scenario where an institution (the defender) externally shared an 800-note corpus of actual outpatient clinical encounter notes from a large, integrated health care delivery system in Washington State. These notes were deidentified by a machine-learned PII tagger and HIPS resynthesis. A malicious attacker obtained and performed a parrot attack intending to expose leaked PII in this corpus. Specifically, the attacker mimicked the defender's process by manually annotating all PII-like content in half of the released corpus, training a PII tagger on these data, and using the trained model to tag the remaining encounter notes. The attacker hypothesized that untagged identifiers would be leaked PII, discoverable by manual review. We evaluated the attacker's success using measures of leak-detection rate and accuracy. Results: The attacker correctly hypothesized that 211 (68%) of 310 actual PII leaks in the corpus were leaks, and wrongly hypothesized that 191 resynthesized PII instances were also leaks. One-third of actual leaks remained … (more)
Is Part Of:: Journal of the American Medical Informatics Association. Volume 26:Number 12(2019)
Journal:: Journal of the American Medical Informatics Association
Issue:: Volume 26:Number 12(2019)
Issue Display:: Volume 26, Issue 12 (2019)
Year:: 2019
Volume:: 26
Issue:: 12
Issue Sort Value:: 2019-0026-0012-0000
Page Start:: 1536
Page End:: 1544
Publication Date:: 2019-08-07
Subjects:: deidentification -- patient privacy -- machine learning -- natural language processing, patient data privacy
Medical informatics -- Periodicals
Information Services -- Periodicals
Medical Informatics -- Periodicals
Médecine -- Informatique -- Périodiques
Informatica
Geneeskunde
Informatique médicale
Computer network resources
Electronic journals
610.285
Journal URLs:: http://jamia.bmj.com/ ↗
http://www.jamia.org ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=76 ↗
http://www.sciencedirect.com/science/journal/10675027 ↗
http://jamia.oxfordjournals.org/ ↗
http://www.oxfordjournals.org/en/ ↗
DOI:: 10.1093/jamia/ocz114 ↗
Languages:: English
ISSNs:: 1067-5027
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 4689.025000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store
Ingest File:: 15713.xml