Cross database audio visual speech adaptation for phonetic spoken term detection. (July 2017)

Record Type:: Journal Article
Title:: Cross database audio visual speech adaptation for phonetic spoken term detection. (July 2017)
Main Title:: Cross database audio visual speech adaptation for phonetic spoken term detection
Authors:: Kalantari, Shahram
Dean, David
Sridharan, Sridha
Abstract:: Highlights: We show that the use of visual information helps both phone recognition and spoken term detection accuracy. Fused HMM adaptation could be utilized to benefit from multiple databases when training audio visual phone models An additional audio adaptation improves cross-database training accuracy for phone recognition and spoken term detection. A post training step can be used to update all HMM parameters and further improve phone recognition accuracy Abstract: Spoken term detection (STD), the process of finding all occurrences of a specified search term in a large amount of speech segments, has many applications in multimedia search and retrieval of information. It is known that use of video information in the form of lip movements can improve the performance of STD in the presence of audio noise. However, research in this direction has been hampered by the unavailability of large annotated audio visual databases for development. We propose a novel approach to develop audio visual spoken term detection when only a small (low resource) audio visual database is available for development. First, cross database training is proposed as a novel framework using the fused hidden Markov modeling (HMM) technique, which is used to train an audio model using extensive large and publicly available audio databases; then it is adapted to the visual data of the given audio visual database. This approach is shown to perform better than standard HMM joint-training method and also … (more)
Is Part Of:: Computer speech & language. Volume 44(2017)
Journal:: Computer speech & language
Issue:: Volume 44(2017)
Issue Display:: Volume 44, Issue 2017 (2017)
Year:: 2017
Volume:: 44
Issue:: 2017
Issue Sort Value:: 2017-0044-2017-0000
Page Start:: 1
Page End:: 21
Publication Date:: 2017-07
Subjects:: Spoken term detection -- Synchronous hidden Markov model -- Cross-database training -- Phone recognition
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2016.09.001 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 10739.xml