Cross-covariance-based features for speech classification in film audio. (December 2015)
- Record Type:
- Journal Article
- Title:
- Cross-covariance-based features for speech classification in film audio. (December 2015)
- Main Title:
- Cross-covariance-based features for speech classification in film audio
- Authors:
- Benatan, Matt
Ng, Kia - Abstract:
- Abstract: As multimedia becomes the dominant form of entertainment through an ever increasing range of digital formats, there has been a growing interest in obtaining information from entertainment media. Speech is one of the core resources in multimedia, providing a foundation for the extraction of semantic information. Thus, detecting speech is a critical first step for speech-based information retrieval systems. This work focuses on speech detection in one of the dominant forms of entertainment media: feature films. A novel approach for voice activity detection (VAD) in film audio is proposed. The approach uses correlation to analyze associations of Mel Frequency Cepstral Coefficient (MFCC) pairs in speech and non-speech data. This information then drives feature selection for the creation of MFCC cross-covariance feature vectors (MFCC-CCs) which are used to train a random forest classifier to solve a binary speech/non-speech classification problem on audio data from entertainment media. The classifier performance is evaluated on a number of test sets and achieves a classification accuracy of up to 94%. The approach is also compared with state of the art and contemporary VAD algorithms, and demonstrates competitive results.
- Is Part Of:
- Journal of visual languages & computing. Volume 31:Part B(2016)
- Journal:
- Journal of visual languages & computing
- Issue:
- Volume 31:Part B(2016)
- Issue Display:
- Volume 31, Issue 2 (2016)
- Year:
- 2016
- Volume:
- 31
- Issue:
- 2
- Issue Sort Value:
- 2016-0031-0002-0000
- Page Start:
- 215
- Page End:
- 221
- Publication Date:
- 2015-12
- Subjects:
- Voice activity detection -- Speech detection -- Binary classification -- Film audio -- Entertainment media
Visual programming languages (Computer science) -- Periodicals
Visual programming (Computer science) -- Periodicals
Programming languages (Electronic computers) -- Semantics -- Periodicals
Langages de programmation visuelle -- Périodiques
Programmation visuelle -- Périodiques
Langages de programmation -- Sémantique -- Périodiques
Programming languages (Electronic computers) -- Semantics
Visual programming (Computer science)
Visual programming languages (Computer science)
Periodicals
Electronic journals
005 - Journal URLs:
- http://www.sciencedirect.com/science/journal/1045926X ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.jvlc.2015.10.011 ↗
- Languages:
- English
- ISSNs:
- 1045-926X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5072.495200
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 1260.xml