ALISA: An automatic lightly supervised speech segmentation and alignment tool. (January 2016)

Record Type:: Journal Article
Title:: ALISA: An automatic lightly supervised speech segmentation and alignment tool. (January 2016)
Main Title:: ALISA: An automatic lightly supervised speech segmentation and alignment tool
Authors:: Stan, A.
Mamiya, Y.
Yamagishi, J.
Bell, P.
Watts, O.
Clark, R.A.J.
King, S.
Abstract:: Abstract : Highlights: ALISA can align speech with imperfect transcripts in any alphabetic language. On average, 70% of the data is being correctly aligned, with a WER of less than 0.5%. Subjective listening tests showed a slight preference for the fully supervised system. Abstract: This paper describes the ALISA tool, which implements a lightly supervised method for sentence-level alignment of speech with imperfect transcripts. Its intended use is to enable the creation of new speech corpora from a multitude of resources in a language-independent fashion, thus avoiding the need to record or transcribe speech data. The method is designed so that it requires minimum user intervention and expert knowledge, and it is able to align data in languages which employ alphabetic scripts. It comprises a GMM-based voice activity detector and a highly constrained grapheme-based speech aligner. The method is evaluated objectively against a gold standard segmentation and transcription, as well as subjectively through building and testing speech synthesis systems from the retrieved data. Results show that on average, 70% of the original data is correctly aligned, with a word error rate of less than 0.5%. In one case, subjective listening tests show a statistically significant preference for voices built on the gold transcript, but this is small and in other tests, no statistically significant differences between the systems built from the fully supervised training data and the one which … (more)
Is Part Of:: Computer speech & language. Volume 35(2016)
Journal:: Computer speech & language
Issue:: Volume 35(2016)
Issue Display:: Volume 35, Issue 2016 (2016)
Year:: 2016
Volume:: 35
Issue:: 2016
Issue Sort Value:: 2016-0035-2016-0000
Page Start:: 116
Page End:: 133
Publication Date:: 2016-01
Subjects:: Speech segmentation -- Speech and text alignment -- Grapheme acoustic models -- Lightly supervised system -- Imperfect transcripts
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2015.06.006 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 8942.xml