Water end-use consumption in low-income households: Evaluation of the impact of preprocessing on the construction of a classification model. (15th December 2021)
- Record Type:
- Journal Article
- Title:
- Water end-use consumption in low-income households: Evaluation of the impact of preprocessing on the construction of a classification model. (15th December 2021)
- Main Title:
- Water end-use consumption in low-income households: Evaluation of the impact of preprocessing on the construction of a classification model
- Authors:
- Oliveira-Esquerre, Karla
Mello, Mariza
Botelho, Gabriella
Deng, Zikang
Koushanfar, Farinaz
Kiperstok, Asher - Abstract:
- Highlights: Water consumption variability affects the preprocessing of time series features. Random Forest and 1NN with the ERP measure show similar performances. Errors linked to preprocess method must be known in order to select that model. Abstract: The challenge of transforming massive water flow data into desegregated smart information according to water end uses is an issue that has motivated many researchers. This challenge is even more difficult in low-income regions owing to the high variability of data because predominant hydraulic devices offer many activation possibilities for users as they are controlled by globe valves. Devices with standardized flow rates such as washing machines or dishwashers are exceptions. A common practice is to apply commercial software that classifies events at the end-use level and then to develop a personalized classification model with enhanced alignment with the database. If the preprocessing step is not performed properly, it can affect perceived device behaviors, which may lead to incorrect conclusions. To evaluate how this variability can interfere with commercial software responses, we developed classification models using a dataset preprocessed by Trace Wizard® as training data and then applied the trained models to a test dataset consisting of events that were authenticated by individual flow sensors. Our goal was to identify the degree of difference between the two datasets. The results demonstrate that when Trace Wizard® isHighlights: Water consumption variability affects the preprocessing of time series features. Random Forest and 1NN with the ERP measure show similar performances. Errors linked to preprocess method must be known in order to select that model. Abstract: The challenge of transforming massive water flow data into desegregated smart information according to water end uses is an issue that has motivated many researchers. This challenge is even more difficult in low-income regions owing to the high variability of data because predominant hydraulic devices offer many activation possibilities for users as they are controlled by globe valves. Devices with standardized flow rates such as washing machines or dishwashers are exceptions. A common practice is to apply commercial software that classifies events at the end-use level and then to develop a personalized classification model with enhanced alignment with the database. If the preprocessing step is not performed properly, it can affect perceived device behaviors, which may lead to incorrect conclusions. To evaluate how this variability can interfere with commercial software responses, we developed classification models using a dataset preprocessed by Trace Wizard® as training data and then applied the trained models to a test dataset consisting of events that were authenticated by individual flow sensors. Our goal was to identify the degree of difference between the two datasets. The results demonstrate that when Trace Wizard® is applied, the features of each device differ from the original water consumption flow, indicating that data variability interferes with the credibility of feedback. Additionally, preprocessing tended to increase the volume, duration, and flow rates, giving the impression that the consumption was higher than the real scenario. The constructed models were not able to overcome the distortions introduced by Trace Wizard® classification. For example, fixtures had poor matches for several houses, with statistical measures below 50%. … (more)
- Is Part Of:
- Expert systems with applications. Volume 185(2021)
- Journal:
- Expert systems with applications
- Issue:
- Volume 185(2021)
- Issue Display:
- Volume 185, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 185
- Issue:
- 2021
- Issue Sort Value:
- 2021-0185-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-12-15
- Subjects:
- Low-income water end use -- Demand management -- Random forest model -- Adaptive KNN model -- ERP measure applied to KNN -- Dataset preprocessing
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2021.115623 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 18906.xml