A reliable KNN filling approach for incomplete interval-valued data. (April 2021)
- Record Type:
- Journal Article
- Title:
- A reliable KNN filling approach for incomplete interval-valued data. (April 2021)
- Main Title:
- A reliable KNN filling approach for incomplete interval-valued data
- Authors:
- Qi, Xiaobo
Guo, Husheng
Wang, Wenjian - Abstract:
- Abstract: Interval-valued data (IVD) is a kind of data where each feature is an interval, and embeds the uncertainty and variability information. However, the missing values (lower or upper bound, or both of them are missed) may occur in the process of data acquisition and transmission, which may lead to obstacles for data processing. To obtain good results, it is important for IVD to process (often ignore or fill) the missing values. A dataset including missing values is named as incomplete interval-valued (IIV) set here. Some ignoring and filling methods for numeric or symbolic data have been proposed, but they cannot be applied for IIV datasets directly. In this work, a reliable k-nearest neighbor approach (RKNN) for incomplete interval-valued data (IIVD) is proposed. A combining rule to determine whether a datum including missing values should be ignored or filled is designed. Those samples with the missing value for each feature will be ignored directly. It is different from existing ignoring methods that need to set the percentage of missing entries. For the rest of missing samples, they will be filled according to their K complete nearest neighbors, which can ensure the filled value more reliable. In so doing, RKNN can exclude a small number of missing samples that may increase uncertainty, and avoid the repetition of the filled values (like median or a fixed constant). The experiment results on 12 synthetic datasets and 4 real-world datasets demonstrate that theAbstract: Interval-valued data (IVD) is a kind of data where each feature is an interval, and embeds the uncertainty and variability information. However, the missing values (lower or upper bound, or both of them are missed) may occur in the process of data acquisition and transmission, which may lead to obstacles for data processing. To obtain good results, it is important for IVD to process (often ignore or fill) the missing values. A dataset including missing values is named as incomplete interval-valued (IIV) set here. Some ignoring and filling methods for numeric or symbolic data have been proposed, but they cannot be applied for IIV datasets directly. In this work, a reliable k-nearest neighbor approach (RKNN) for incomplete interval-valued data (IIVD) is proposed. A combining rule to determine whether a datum including missing values should be ignored or filled is designed. Those samples with the missing value for each feature will be ignored directly. It is different from existing ignoring methods that need to set the percentage of missing entries. For the rest of missing samples, they will be filled according to their K complete nearest neighbors, which can ensure the filled value more reliable. In so doing, RKNN can exclude a small number of missing samples that may increase uncertainty, and avoid the repetition of the filled values (like median or a fixed constant). The experiment results on 12 synthetic datasets and 4 real-world datasets demonstrate that the proposed method can process the incomplete interval-valued data effectively, and obtain a good classification performance simultaneously. Highlights: RKNN may provide a rule to delete missing value samples without setting the percentage of missing entries. Filling values are closer to real missing ones and the repetition of the filled values can be avoided. High filling rate and positive filling effect could assure the completeness and reliability of the dataset. … (more)
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 100(2021)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 100(2021)
- Issue Display:
- Volume 100, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 100
- Issue:
- 2021
- Issue Sort Value:
- 2021-0100-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-04
- Subjects:
- Interval-valued data -- Incomplete interval-valued set -- Missing value -- Combining rule
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2021.104175 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16719.xml