Minimum training sample size requirements for achieving high prediction accuracy with the BN model: A case study regarding seismic liquefaction. (15th December 2021)
- Record Type:
- Journal Article
- Title:
- Minimum training sample size requirements for achieving high prediction accuracy with the BN model: A case study regarding seismic liquefaction. (15th December 2021)
- Main Title:
- Minimum training sample size requirements for achieving high prediction accuracy with the BN model: A case study regarding seismic liquefaction
- Authors:
- Hu, Jilei
Zou, Wenjun
Wang, Jing
Pang, Luou - Abstract:
- Highlights: Analyzing the factors affecting the predictive performance of the Bayesian network model. Modifying a structural entropy that can characterize the complexity of a BN structure better. The minimum training sample requirements are related to the k max in and k a . Constructing an empirical function of determining the minimum training sample size of the model. Abstract: The complexity of a Bayesian network (BN) model and the number of training samples used have significant impacts on the prediction accuracy of the model regarding seismic liquefaction. The required training sample size for ensuring that a BN model has high generalization ability is a critical issue in parameter learning. To address this issue, this study analyses the relationship between the predictive performance of the BN model and the complexity of the model, training sample size, and average discrete intervals. Taking seismic liquefaction prediction as an example, 4536 statistical experiments are designed to investigate the training and testing performances of 21 different BN models under the conditions of nine different training sample size ratios (5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, and 80% of all data) and testing samples using 20% of all data. The results reveal that the learning performance of a BN is not sensitive to the training sample size but is related to the complexity of the model. The larger the sample size is, the stronger the generalization ability of the model. The minimumHighlights: Analyzing the factors affecting the predictive performance of the Bayesian network model. Modifying a structural entropy that can characterize the complexity of a BN structure better. The minimum training sample requirements are related to the k max in and k a . Constructing an empirical function of determining the minimum training sample size of the model. Abstract: The complexity of a Bayesian network (BN) model and the number of training samples used have significant impacts on the prediction accuracy of the model regarding seismic liquefaction. The required training sample size for ensuring that a BN model has high generalization ability is a critical issue in parameter learning. To address this issue, this study analyses the relationship between the predictive performance of the BN model and the complexity of the model, training sample size, and average discrete intervals. Taking seismic liquefaction prediction as an example, 4536 statistical experiments are designed to investigate the training and testing performances of 21 different BN models under the conditions of nine different training sample size ratios (5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, and 80% of all data) and testing samples using 20% of all data. The results reveal that the learning performance of a BN is not sensitive to the training sample size but is related to the complexity of the model. The larger the sample size is, the stronger the generalization ability of the model. The minimum training sample requirements are related to the maximum in-degrees and the average discrete intervals, not the numbers of nodes and edges and the maximum out-degree of the BN structure. In addition, a modified structural entropy can characterize the complexity of a BN structure better than the existing structural entropy, but it has a worse relationship with the minimum training sample requirements than that of the maximum in-degree. To quickly determine the minimum training sample size requirements of a BN model with a predictive accuracy of 80%, a fitting function that considers the effects of the maximum in-degree and the average discrete intervals is presented, and its effectiveness is validated by two examples. … (more)
- Is Part Of:
- Expert systems with applications. Volume 185(2021)
- Journal:
- Expert systems with applications
- Issue:
- Volume 185(2021)
- Issue Display:
- Volume 185, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 185
- Issue:
- 2021
- Issue Sort Value:
- 2021-0185-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-12-15
- Subjects:
- Seismic liquefaction -- Bayesian network -- Sample size -- Complexity -- Predictive performance
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2021.115702 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 18929.xml