Influence of statistical feature normalisation methods on K-Nearest Neighbours and K-Means in the context of industry 4.0. (May 2022)
- Record Type:
- Journal Article
- Title:
- Influence of statistical feature normalisation methods on K-Nearest Neighbours and K-Means in the context of industry 4.0. (May 2022)
- Main Title:
- Influence of statistical feature normalisation methods on K-Nearest Neighbours and K-Means in the context of industry 4.0
- Authors:
- Niño-Adan, Iratxe
Landa-Torres, Itziar
Portillo, Eva
Manjarres, Diana - Abstract:
- Abstract: Normalisation is a preprocessing technique widely employed in Machine Learning (ML)-based solutions for industry to equalise the features' contribution. However, few researchers have analysed the normalisation effect and its implications on the ML algorithm performance, especially on Euclidean distance-based algorithms, such as the well-known K-Nearest Neighbours and K-means. In this sense, this paper formally analyses the effect of normalisation yielding results significantly far from the state-of-the-art traditional claims. In particular, this paper shows that normalisation does not equalise the contribution of the features, with the consequent impact on the performance of the learning process for a particular problem. More concretely, this demonstration is made on K-Nearest Neighbours and K-means Euclidean distance-based ML algorithms. This paper concludes that normalisation can be viewed as an unsupervised Feature Weighting method. In this context, a new metric ( Normalisation weight ) for measuring the impact of normalisation on the features is presented. Likewise, an analysis of the normalisation effect on the Euclidean distance is conducted and a new metric referred to as Proportional influence that measures the features influence on the Euclidean distance is proposed. Both metrics enable the automatic selection of the most appropriate normalisation method for a particular engineering problem, which can significantly improve both the computational cost andAbstract: Normalisation is a preprocessing technique widely employed in Machine Learning (ML)-based solutions for industry to equalise the features' contribution. However, few researchers have analysed the normalisation effect and its implications on the ML algorithm performance, especially on Euclidean distance-based algorithms, such as the well-known K-Nearest Neighbours and K-means. In this sense, this paper formally analyses the effect of normalisation yielding results significantly far from the state-of-the-art traditional claims. In particular, this paper shows that normalisation does not equalise the contribution of the features, with the consequent impact on the performance of the learning process for a particular problem. More concretely, this demonstration is made on K-Nearest Neighbours and K-means Euclidean distance-based ML algorithms. This paper concludes that normalisation can be viewed as an unsupervised Feature Weighting method. In this context, a new metric ( Normalisation weight ) for measuring the impact of normalisation on the features is presented. Likewise, an analysis of the normalisation effect on the Euclidean distance is conducted and a new metric referred to as Proportional influence that measures the features influence on the Euclidean distance is proposed. Both metrics enable the automatic selection of the most appropriate normalisation method for a particular engineering problem, which can significantly improve both the computational cost and classification performance of K-Nearest Neighbours and K-means algorithms. The analytical conclusions are validated on well-known datasets from the UCI repository and a real-life application from the refinery industry. … (more)
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 111(2022)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 111(2022)
- Issue Display:
- Volume 111, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 111
- Issue:
- 2022
- Issue Sort Value:
- 2022-0111-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-05
- Subjects:
- Feature normalisation -- Feature weighting -- Machine learning -- Euclidean distance -- K-nearest neighbours -- K-means
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2022.104807 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21214.xml