Application of Whole‐Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium. Issue 9 (8th June 2020)
- Record Type:
- Journal Article
- Title:
- Application of Whole‐Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium. Issue 9 (8th June 2020)
- Main Title:
- Application of Whole‐Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium
- Authors:
- Munck, Nanna
Njage, Patrick Murigu Kamau
Leekitcharoenphon, Pimlapas
Litrup, Eva
Hald, Tine - Abstract:
- Abstract: Prevention of the emergence and spread of foodborne diseases is an important prerequisite for the improvement of public health. Source attribution models link sporadic human cases of a specific illness to food sources and animal reservoirs. With the next generation sequencing technology, it is possible to develop novel source attribution models. We investigated the potential of machine learning to predict the animal reservoir from which a bacterial strain isolated from a human salmonellosis case originated based on whole‐genome sequencing. Machine learning methods recognize patterns in large and complex data sets and use this knowledge to build models. The model learns patterns associated with genetic variations in bacteria isolated from the different animal reservoirs. We selected different machine learning algorithms to predict sources of human salmonellosis cases and trained the model with Danish Salmonella Typhimurium isolates sampled from broilers ( n = 34), cattle ( n = 2), ducks ( n = 11), layers ( n = 4), and pigs ( n = 159). Using cgMLST as input features, the model yielded an average accuracy of 0.783 (95% CI: 0.77–0.80) in the source prediction for the random forest and 0.933 (95% CI: 0.92–0.94) for the logit boost algorithm. Logit boost algorithm was most accurate (valid accuracy: 92%, CI: 0.8706–0.9579) and predicted the origin of 81% of the domestic sporadic human salmonellosis cases. The most important source was Danish produced pigs (53%) followedAbstract: Prevention of the emergence and spread of foodborne diseases is an important prerequisite for the improvement of public health. Source attribution models link sporadic human cases of a specific illness to food sources and animal reservoirs. With the next generation sequencing technology, it is possible to develop novel source attribution models. We investigated the potential of machine learning to predict the animal reservoir from which a bacterial strain isolated from a human salmonellosis case originated based on whole‐genome sequencing. Machine learning methods recognize patterns in large and complex data sets and use this knowledge to build models. The model learns patterns associated with genetic variations in bacteria isolated from the different animal reservoirs. We selected different machine learning algorithms to predict sources of human salmonellosis cases and trained the model with Danish Salmonella Typhimurium isolates sampled from broilers ( n = 34), cattle ( n = 2), ducks ( n = 11), layers ( n = 4), and pigs ( n = 159). Using cgMLST as input features, the model yielded an average accuracy of 0.783 (95% CI: 0.77–0.80) in the source prediction for the random forest and 0.933 (95% CI: 0.92–0.94) for the logit boost algorithm. Logit boost algorithm was most accurate (valid accuracy: 92%, CI: 0.8706–0.9579) and predicted the origin of 81% of the domestic sporadic human salmonellosis cases. The most important source was Danish produced pigs (53%) followed by imported pigs (16%), imported broilers (6%), imported ducks (2%), Danish produced layers (2%), Danish produced cattle and imported cattle (<1%) while 18% was not predicted. Machine learning has potential for improving source attribution modeling based on sequence data. Results of such models can inform risk managers to identify and prioritize food safety interventions. … (more)
- Is Part Of:
- Risk analysis. Volume 40:Issue 9(2020)
- Journal:
- Risk analysis
- Issue:
- Volume 40:Issue 9(2020)
- Issue Display:
- Volume 40, Issue 9 (2020)
- Year:
- 2020
- Volume:
- 40
- Issue:
- 9
- Issue Sort Value:
- 2020-0040-0009-0000
- Page Start:
- 1693
- Page End:
- 1705
- Publication Date:
- 2020-06-08
- Subjects:
- Machine learning -- source attribution -- whole genome sequencing
Technology -- Risk assessment -- Periodicals
658.403 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1539-6924 ↗
http://www.blackwellpublishers.co.uk/Online ↗
http://www.blackwellpublishing.com/journal.asp?ref=0272-4332 ↗
http://www.ingenta.com/journals/browse/bpl/risk ↗
http://www.wkap.nl/jrnltoc.htm/0272-4332 ↗
http://onlinelibrary.wiley.com/ ↗
http://firstsearch.oclc.org ↗
http://firstsearch.oclc.org/journal=0272-4332;screen=info;ECOIP ↗ - DOI:
- 10.1111/risa.13510 ↗
- Languages:
- English
- ISSNs:
- 0272-4332
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 7972.583000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21679.xml