An efficient framework for semantically-correlated term detection and sanitization in clinical documents. (May 2022)
- Record Type:
- Journal Article
- Title:
- An efficient framework for semantically-correlated term detection and sanitization in clinical documents. (May 2022)
- Main Title:
- An efficient framework for semantically-correlated term detection and sanitization in clinical documents
- Authors:
- Moqurrab, Syed Atif
Anjum, Adeel
Tariq, Noshina
Srivastava, Gautam - Abstract:
- Abstract: In clinical documents, privacy and confidentiality protection are the two main challenges before sharing or publishing data. According to the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR), even a few terms can cause privacy threats. In retrospect, confidentiality threats are not fully explored due to the complex nature as well as massive number of clinical terms and phrases. Current approaches use information theoretic-based techniques to detect and sanitize risky semantically-correlated terms. However, they have language ambiguity and non-monotonic behavior, coupled with the fact that pre-trained classifiers and human-tagging are required to construct classifiers. This paper offers a generic and adaptable method for protecting risky terms in clinical data using word embedding (Word2Vec and BERT) for risky term detection and comparative analysis. Our methodology uses WordNet taxonomy to minimize a document's semantic and utility loss by substituting privacy-preserving generalization for disclosive words and by eliminating manual data tagging. The results show significant protection and utility-preservation, compared to information-theoretic approaches.
- Is Part Of:
- Computers & electrical engineering. Volume 100(2022)
- Journal:
- Computers & electrical engineering
- Issue:
- Volume 100(2022)
- Issue Display:
- Volume 100, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 100
- Issue:
- 2022
- Issue Sort Value:
- 2022-0100-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-05
- Subjects:
- Machine learning -- Data privacy -- Unsupervised learning -- Semantically-correlated terms -- Detection -- Sanitization -- Utility-preservation -- Clinical documents -- Clinical data privacy -- Word embedding
Computer engineering -- Periodicals
Electrical engineering -- Periodicals
Electrical engineering -- Data processing -- Periodicals
Ordinateurs -- Conception et construction -- Périodiques
Électrotechnique -- Périodiques
Électrotechnique -- Informatique -- Périodiques
Computer engineering
Electrical engineering
Electrical engineering -- Data processing
Periodicals
Electronic journals
621.302854 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00457906/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compeleceng.2022.107985 ↗
- Languages:
- English
- ISSNs:
- 0045-7906
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.680000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21754.xml