Class-imbalanced voice pathology classification: Combining hybrid sampling with optimal two-factor random forests. (15th March 2022)
- Record Type:
- Journal Article
- Title:
- Class-imbalanced voice pathology classification: Combining hybrid sampling with optimal two-factor random forests. (15th March 2022)
- Main Title:
- Class-imbalanced voice pathology classification: Combining hybrid sampling with optimal two-factor random forests
- Authors:
- Zhang, Xiaojun
Zhou, Changwei
Zhu, Xincheng
Tao, Zhi
Zhao, Heming - Abstract:
- Highlights: Proposed fluid-solid coupling features for pathological detection. Better performance with two-factor random forests than traditional ones. Hybrid sampling combined with two-factor random forests for imbalanced set. Abstract: Classifying imbalanced data is a common problem with pathological voice detection. Traditional classification algorithms usually assume that the number of samples in each category is similar, and that the cost of misclassification in training is equal. However, the cost of misclassifying pathological samples in pathological voice detection is higher than that of normal samples. Here, a hybrid sampling algorithm combined with optimal two-factor random forests is proposed for imbalanced classification of pathological voice detection. On the basis of two-factor random forests, it combines the synthetic minority oversampling technique (SMOTE) with the edited nearest neighbor (ENN) algorithm. SMOTE is used to increase the number of samples in a minority class. The oversampling rate of SMOTE is the out-of-bag misclassification rate of the two-factor random forests. ENN is then used to remove the noise in the majority class samples. Finally, the two-factor random forests classifies the resampled voice, and stops the iteration according to a classification evaluation index (such as the F1-macro). Binary classification and multi-classification between normal and pathological voices in the Massachusetts Eye and Ear Infirmary database demonstrate thatHighlights: Proposed fluid-solid coupling features for pathological detection. Better performance with two-factor random forests than traditional ones. Hybrid sampling combined with two-factor random forests for imbalanced set. Abstract: Classifying imbalanced data is a common problem with pathological voice detection. Traditional classification algorithms usually assume that the number of samples in each category is similar, and that the cost of misclassification in training is equal. However, the cost of misclassifying pathological samples in pathological voice detection is higher than that of normal samples. Here, a hybrid sampling algorithm combined with optimal two-factor random forests is proposed for imbalanced classification of pathological voice detection. On the basis of two-factor random forests, it combines the synthetic minority oversampling technique (SMOTE) with the edited nearest neighbor (ENN) algorithm. SMOTE is used to increase the number of samples in a minority class. The oversampling rate of SMOTE is the out-of-bag misclassification rate of the two-factor random forests. ENN is then used to remove the noise in the majority class samples. Finally, the two-factor random forests classifies the resampled voice, and stops the iteration according to a classification evaluation index (such as the F1-macro). Binary classification and multi-classification between normal and pathological voices in the Massachusetts Eye and Ear Infirmary database demonstrate that the proposed algorithm effectively handles the problem of imbalanced pathological voice classification. Compared with a traditional sampling algorithm, the accuracy and recall of the proposed algorithm in multi-classification of voice disorders increased by 3.64% and 2.25%, respectively. … (more)
- Is Part Of:
- Applied acoustics. Volume 190(2022)
- Journal:
- Applied acoustics
- Issue:
- Volume 190(2022)
- Issue Display:
- Volume 190, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 190
- Issue:
- 2022
- Issue Sort Value:
- 2022-0190-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-03-15
- Subjects:
- Pathological voice -- Imbalanced data classification -- Data resampling -- Random forests
Acoustical engineering -- Periodicals
Periodicals
620.2 - Journal URLs:
- http://www.sciencedirect.com/science/journal/0003682X ↗
http://www.elsevier.com/journals ↗
http://www.elsevier.com/homepage/elecserv.htt ↗ - DOI:
- 10.1016/j.apacoust.2021.108618 ↗
- Languages:
- English
- ISSNs:
- 0003-682X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 1571.400000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20858.xml