Adjusted weight voting algorithm for random forests in handling missing values. (September 2017)
- Record Type:
- Journal Article
- Title:
- Adjusted weight voting algorithm for random forests in handling missing values. (September 2017)
- Main Title:
- Adjusted weight voting algorithm for random forests in handling missing values
- Authors:
- Xia, Jing
Zhang, Shengyu
Cai, Guolong
Li, Li
Pan, Qing
Yan, Jing
Ning, Gangmin - Abstract:
- Highlights: A novel algorithm based on random forests with surrogate splits is proposed to address the classification problem of incomplete data without imputation. The algorithm allows each tree to cast a vote even the voting process is interrupted by missing attributes. Experimental results on various acknowledged datasets show that the proposed method is robust and efficient. Abstract: Random forests (RF) is known as an efficient algorithm in classification, however it depends on the integrity of datasets. Conventional methods in dealing with missing values usually employ estimation and imputation approaches whose efficiency is tied to the assumptions of data features. Recently, algorithm of surrogate decisions in RF was developed and this paper proposes a random forests algorithm with modified surrogate splits (Adjusted Weight Voting Random Forest, AWVRF) which is able to address the incomplete data without imputation. Differing from the present surrogate method, in AWVRF algorithm, when the primary splitting attribute and the surrogate attributes of an internal node are all missing, the undergoing instance is allowed to exit at the current node with a vote. Then the weight of the vote is adjusted by the strength of the involved attributes and the final decision is made by weighted voting. AWVRF does not comprise imputation step, thus it is independent of data features. AWVRF is compared with the methods of mean imputation, LeoFill, knnimpute, BPCAfill and conventionalHighlights: A novel algorithm based on random forests with surrogate splits is proposed to address the classification problem of incomplete data without imputation. The algorithm allows each tree to cast a vote even the voting process is interrupted by missing attributes. Experimental results on various acknowledged datasets show that the proposed method is robust and efficient. Abstract: Random forests (RF) is known as an efficient algorithm in classification, however it depends on the integrity of datasets. Conventional methods in dealing with missing values usually employ estimation and imputation approaches whose efficiency is tied to the assumptions of data features. Recently, algorithm of surrogate decisions in RF was developed and this paper proposes a random forests algorithm with modified surrogate splits (Adjusted Weight Voting Random Forest, AWVRF) which is able to address the incomplete data without imputation. Differing from the present surrogate method, in AWVRF algorithm, when the primary splitting attribute and the surrogate attributes of an internal node are all missing, the undergoing instance is allowed to exit at the current node with a vote. Then the weight of the vote is adjusted by the strength of the involved attributes and the final decision is made by weighted voting. AWVRF does not comprise imputation step, thus it is independent of data features. AWVRF is compared with the methods of mean imputation, LeoFill, knnimpute, BPCAfill and conventional RF with surrogate decisions (surrRF) using 50 times repeated 5-fold cross validation on 10 acknowledged datasets. In a total of 22 experiment settings, the method of AWVRF harvests the highest accuracy in 14 settings and the largest AUC in 7 settings, exhibiting its superiority over other methods. Compared with surrRF, AWVRF is significantly more efficient and remain good discrimination of prediction. Experimental results show that the present AWVRF algorithm can successfully handle the classification task for incomplete data. … (more)
- Is Part Of:
- Pattern recognition. Volume 69(2017:Sep.)
- Journal:
- Pattern recognition
- Issue:
- Volume 69(2017:Sep.)
- Issue Display:
- Volume 69 (2017)
- Year:
- 2017
- Volume:
- 69
- Issue Sort Value:
- 2017-0069-0000-0000
- Page Start:
- 52
- Page End:
- 60
- Publication Date:
- 2017-09
- Subjects:
- Random forests -- Missing values -- Imputation approaches -- Surrogate decisions -- Weighted voting
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2017.04.005 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2641.xml