Active cleaning of label noise. (March 2016)
- Record Type:
- Journal Article
- Title:
- Active cleaning of label noise. (March 2016)
- Main Title:
- Active cleaning of label noise
- Authors:
- Ekambaram, Rajmadhan
Fefilatyev, Sergiy
Shreve, Matthew
Kramer, Kurt
Hall, Lawrence O.
Goldgof, Dmitry B.
Kasturi, Rangachar - Abstract:
- Abstract: Mislabeled examples in the training data can severely affect the performance of supervised classifiers. In this paper, we present an approach to remove any mislabeled examples in the dataset by selecting suspicious examples as targets for inspection. We show that the large margin and soft margin principles used in support vector machines (SVM) have the characteristic of capturing the mislabeled examples as support vectors. Experimental results on two character recognition datasets show that one-class and two-class SVMs are able to capture around 85% and 99% of label noise examples, respectively, as their support vectors. We propose another new method that iteratively builds two-class SVM classifiers on the non-support vector examples from the training data followed by an expert manually verifying the support vectors based on their classification score to identify any mislabeled examples. We show that this method reduces the number of examples to be reviewed, as well as providing parameter independence of this method, through experimental results on four data sets. So, by (re-)examining the labels of the selective support vectors, most noise can be removed. This can be quite advantageous when rapidly building a labeled data set. Abstract : Highlights: Novel method for label noise removal from data is introduced. It significantly reduces the required number of examples to be reviewed. Support vectors of SVM classifier can capture around 99% of label noise examples.Abstract: Mislabeled examples in the training data can severely affect the performance of supervised classifiers. In this paper, we present an approach to remove any mislabeled examples in the dataset by selecting suspicious examples as targets for inspection. We show that the large margin and soft margin principles used in support vector machines (SVM) have the characteristic of capturing the mislabeled examples as support vectors. Experimental results on two character recognition datasets show that one-class and two-class SVMs are able to capture around 85% and 99% of label noise examples, respectively, as their support vectors. We propose another new method that iteratively builds two-class SVM classifiers on the non-support vector examples from the training data followed by an expert manually verifying the support vectors based on their classification score to identify any mislabeled examples. We show that this method reduces the number of examples to be reviewed, as well as providing parameter independence of this method, through experimental results on four data sets. So, by (re-)examining the labels of the selective support vectors, most noise can be removed. This can be quite advantageous when rapidly building a labeled data set. Abstract : Highlights: Novel method for label noise removal from data is introduced. It significantly reduces the required number of examples to be reviewed. Support vectors of SVM classifier can capture around 99% of label noise examples. Two-class SVM captures more label noise examples than one-class SVM classifier Combination of one-class and two-class SVM produces a marginal improvement. … (more)
- Is Part Of:
- Pattern recognition. Volume 51(2016:Mar.)
- Journal:
- Pattern recognition
- Issue:
- Volume 51(2016:Mar.)
- Issue Display:
- Volume 51 (2016)
- Year:
- 2016
- Volume:
- 51
- Issue Sort Value:
- 2016-0051-0000-0000
- Page Start:
- 463
- Page End:
- 480
- Publication Date:
- 2016-03
- Subjects:
- Support vectors -- Label noise -- Mislabeled examples
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2015.09.020 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 59.xml