O-114 Natural language processing as a tool for developing and updating job exposure matrices for chemical exposures in the general population. (14th March 2023)
- Record Type:
- Journal Article
- Title:
- O-114 Natural language processing as a tool for developing and updating job exposure matrices for chemical exposures in the general population. (14th March 2023)
- Main Title:
- O-114 Natural language processing as a tool for developing and updating job exposure matrices for chemical exposures in the general population
- Authors:
- Basinas, Ioannis
Thompson, Paul
Xie, Qianqian
Annaniadou, Sophia
Ge, Calvin
Kuijpers, Eelco
Tinnerberg, Hakan
Stockholm, Zara Ann
Kirkeleit, Jorunn
Galea, Karen S
Brinchmann, Bendik
Cramer, Christine
Taher, Evana Amir
Schlunssen, Vivi
Tonger, Martie van - Abstract:
- Abstract : Workplaces are dynamic environments, in which temporal changes in conditions and exposures frequently occur. Such changes are rarely captured by existing Job Exposure Matrices (JEMs), which are typically developed using information available at a certain point in time. As such, they are unable to take into account potential future changes, which could negatively impact the reliability of JEMs when used outside their development period. Moreover, the process of developing JEMs for emerging or new exposure factors is a laborious, time-consuming process. Within the Exposome Project for Health and Occupational Research (EPHOR; https://www.ephor-project.eu/), we have been exploring the use of Natural Language Processing (NLP) as a vehicle for streamlining the update of existing JEMs and the development of new JEMs. Specifically, we will develop named entity recognition (NER) tools to automatically detect mentions of exposure-related concepts in literature, thus increasing the efficiency of locating relevant information for JEM update and development.Accordingly, we have developed a novel annotated corpus, i.e., 50 literature articles concerning workplace exposure to diesel exhaust, in which exposure assessment experts used guidelines to annotate all mentions of six different named entity categories (substance, occupation, industry/workplace, job task/activity, measurement device and sample type) occurring in the abstract, methods and results sections. The corpus willAbstract : Workplaces are dynamic environments, in which temporal changes in conditions and exposures frequently occur. Such changes are rarely captured by existing Job Exposure Matrices (JEMs), which are typically developed using information available at a certain point in time. As such, they are unable to take into account potential future changes, which could negatively impact the reliability of JEMs when used outside their development period. Moreover, the process of developing JEMs for emerging or new exposure factors is a laborious, time-consuming process. Within the Exposome Project for Health and Occupational Research (EPHOR; https://www.ephor-project.eu/), we have been exploring the use of Natural Language Processing (NLP) as a vehicle for streamlining the update of existing JEMs and the development of new JEMs. Specifically, we will develop named entity recognition (NER) tools to automatically detect mentions of exposure-related concepts in literature, thus increasing the efficiency of locating relevant information for JEM update and development.Accordingly, we have developed a novel annotated corpus, i.e., 50 literature articles concerning workplace exposure to diesel exhaust, in which exposure assessment experts used guidelines to annotate all mentions of six different named entity categories (substance, occupation, industry/workplace, job task/activity, measurement device and sample type) occurring in the abstract, methods and results sections. The corpus will be used to train machine learning NER algorithms. Each article was annotated independently by two experts, and Inter-Annotator Agreement (IAA) scores were calculated to assess annotation quality. Exact matching scores (requiring agreement of semantic category and exact annotation span) ranged from 0.38 to 0.79 F1 for individual categories (average: 0.56). Relaxed matching scores (requiring agreement of category and partially overlapping spans) ranged from 0.63 to 0.87 F1 (average: 0.72). These results suggest that annotation quality is sufficient for machine learning. We will present the annotation scheme, guidelines and preliminary analysis of the results. … (more)
- Is Part Of:
- Occupational and environmental medicine. Volume 80(2023)Supplement 1
- Journal:
- Occupational and environmental medicine
- Issue:
- Volume 80(2023)Supplement 1
- Issue Display:
- Volume 80, Issue 1 (2023)
- Year:
- 2023
- Volume:
- 80
- Issue:
- 1
- Issue Sort Value:
- 2023-0080-0001-0000
- Page Start:
- A8
- Page End:
- A8
- Publication Date:
- 2023-03-14
- Subjects:
- Medicine, Industrial -- Periodicals
Environmental health -- Periodicals
616.980305 - Journal URLs:
- http://oem.bmj.com/ ↗
http://www.jstor.org/journals/13510711.html ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=172&action=archive ↗
http://www.bmj.com/archive ↗ - DOI:
- 10.1136/OEM-2023-EPICOH.19 ↗
- Languages:
- English
- ISSNs:
- 1351-0711
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 26970.xml