A data-driven method for detecting and diagnosing causes of water quality contamination in a dataset with a high rate of missing values. (October 2020)
- Record Type:
- Journal Article
- Title:
- A data-driven method for detecting and diagnosing causes of water quality contamination in a dataset with a high rate of missing values. (October 2020)
- Main Title:
- A data-driven method for detecting and diagnosing causes of water quality contamination in a dataset with a high rate of missing values
- Authors:
- Ngouna, Raymond Houé
Ratolojanahary, Romy
Medjaher, Kamal
Dauriac, Fabien
Sebilo, Mathieu
Junca-Bourié, Jean - Abstract:
- Abstract: Democratization of sensing devices in industrial systems has made it possible to collect a large amount of data of different types, which has led to the necessity of handling complex analyses for knowledge extraction. The field of water resources is of those areas which has drawn the attention of decision-makers seeking to preserve human health and safety. Recent advances in Artificial Intelligence, particularly in the domain of Machine Learning, have opened the potential to leverage massive data to better address the issue related to the relationship between water quality and human activities. However, high rate of missing data and heterogeneity of the measurements are scientific issues that cannot be solved by standard methods, especially when no prior knowledge on the label of each observation is provided. In this article, Prognostics and Health Management was implemented to detect and diagnose anomalies in water quality datasets, taking into account the uncertainties induced by the above-mentioned issues. Fuzzy c-means was used to identify the different water quality classes, while Random Forest was applied to determine the most influencing parameters, with respect to potential contamination of water resources in the southwest of France. The results suggest that multiple imputation methods can handle the missingness issue, while the use of decision rules based on well-known water quality standards can solve the problem regarding the lack of labelledAbstract: Democratization of sensing devices in industrial systems has made it possible to collect a large amount of data of different types, which has led to the necessity of handling complex analyses for knowledge extraction. The field of water resources is of those areas which has drawn the attention of decision-makers seeking to preserve human health and safety. Recent advances in Artificial Intelligence, particularly in the domain of Machine Learning, have opened the potential to leverage massive data to better address the issue related to the relationship between water quality and human activities. However, high rate of missing data and heterogeneity of the measurements are scientific issues that cannot be solved by standard methods, especially when no prior knowledge on the label of each observation is provided. In this article, Prognostics and Health Management was implemented to detect and diagnose anomalies in water quality datasets, taking into account the uncertainties induced by the above-mentioned issues. Fuzzy c-means was used to identify the different water quality classes, while Random Forest was applied to determine the most influencing parameters, with respect to potential contamination of water resources in the southwest of France. The results suggest that multiple imputation methods can handle the missingness issue, while the use of decision rules based on well-known water quality standards can solve the problem regarding the lack of labelled observations. In addition, two potential sources of contamination (atrazine and nitrate) were identified and then validated by hydrogeology experts, prior to further online deployment of the proposed model. Highlights: Anomaly detection method allowing to handle a high rate of missing values. Definition of decision rule to handle the lack of prior knowledge on the raw samples. Hybridization with Genetic Algorithm to optimize the hyperparameters choice. Recommendations for the data collection strategy to reduce underlying costs. … (more)
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 95(2020)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 95(2020)
- Issue Display:
- Volume 95, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 95
- Issue:
- 2020
- Issue Sort Value:
- 2020-0095-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-10
- Subjects:
- Intelligent fault detection -- Diagnostics -- Water quality -- Uncertainty -- Unsupervised learning -- Fuzzy c-means -- Random forest
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2020.103822 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14012.xml