Anomaly-based annotation error detection in speech-synthesis corpora. (November 2017)
- Record Type:
- Journal Article
- Title:
- Anomaly-based annotation error detection in speech-synthesis corpora. (November 2017)
- Main Title:
- Anomaly-based annotation error detection in speech-synthesis corpora
- Authors:
- Matoušek, Jindřich
Tihelka, Daniel - Abstract:
- Highlights: Anomaly-based detection could be used to detect word-level annotation errors. Automatically selected feature sets achieve similar results as hand-crafted ones. Training data size can be significantly reduced while keeping good results. Combination of several detectors has a potential to overcome individual detectors. Classification does not outperform anomaly detection and is more sensitive to data size. Abstract: We investigate the problem of automatic detection of annotation errors in single-speaker read-speech corpora used for speech synthesis. For the purpose of annotation error detection, we adopt an anomaly detection framework in which correctly annotated words are considered as normal examples on which the detection methods are trained. Misannotated words are then taken as anomalous examples which do not conform to normal patterns of the trained detection models. We propose and evaluate several anomaly detection models – Gaussian distribution based detectors, Grubbs' test based detector, and one-class support vector machine based detector. Word-level feature sets including basic features derived from forced alignment and various acoustic, spectral, phonetic, and positional features are examined to find an optimal set of features for each anomaly detector. The results with F 1 score being almost 89% show that anomaly detection could help detecting annotation errors in read-speech corpora for speech synthesis. Furthermore, dimensionality reduction techniquesHighlights: Anomaly-based detection could be used to detect word-level annotation errors. Automatically selected feature sets achieve similar results as hand-crafted ones. Training data size can be significantly reduced while keeping good results. Combination of several detectors has a potential to overcome individual detectors. Classification does not outperform anomaly detection and is more sensitive to data size. Abstract: We investigate the problem of automatic detection of annotation errors in single-speaker read-speech corpora used for speech synthesis. For the purpose of annotation error detection, we adopt an anomaly detection framework in which correctly annotated words are considered as normal examples on which the detection methods are trained. Misannotated words are then taken as anomalous examples which do not conform to normal patterns of the trained detection models. We propose and evaluate several anomaly detection models – Gaussian distribution based detectors, Grubbs' test based detector, and one-class support vector machine based detector. Word-level feature sets including basic features derived from forced alignment and various acoustic, spectral, phonetic, and positional features are examined to find an optimal set of features for each anomaly detector. The results with F 1 score being almost 89% show that anomaly detection could help detecting annotation errors in read-speech corpora for speech synthesis. Furthermore, dimensionality reduction techniques are also examined to automatically reduce the number of features used to describe the annotated words. We show that the automatically reduced feature sets achieve statistically similar results as the hand-crafted feature sets. We also conducted additional experiments to investigate both robustness of the proposed anomaly detection framework with respect to particular data sets used for development and evaluation and the influence of the number of examples needed for anomaly detection. We show that a reasonably good detection performance could be reached with using significantly fewer examples during the detector development phase. We also propose a concept of a voting detector – a combination of anomaly detectors in which each "single" detector "votes" on whether or not a testing word is annotated correctly, and the final decision is then made by aggregating the votes. Our results show that the voting detector has a potential to overcome each of the single anomaly detectors. Furthermore, we compare the proposed anomaly detection framework to a classification-based approach (which, unlike anomaly detection, needs to use anomalous examples during training) and we show that both approaches lead to statistically comparable results when all available anomalous examples are utilized during detector/classifier development. However, when a smaller number of anomalous examples are used, the proposed anomaly detection framework clearly outperforms the classification-based approach. A final listening test showed the effectiveness of the proposed anomaly-based annotation error detection for improving the quality of synthetic speech. … (more)
- Is Part Of:
- Computer speech & language. Volume 46(2017)
- Journal:
- Computer speech & language
- Issue:
- Volume 46(2017)
- Issue Display:
- Volume 46, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 46
- Issue:
- 2017
- Issue Sort Value:
- 2017-0046-2017-0000
- Page Start:
- 1
- Page End:
- 35
- Publication Date:
- 2017-11
- Subjects:
- Annotation error detection -- Anomaly detection -- Read speech corpora -- Speech synthesis
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.04.007 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2908.xml