Human beatbox sound recognition using an automatic speech recognition toolkit. (May 2021)
- Record Type:
- Journal Article
- Title:
- Human beatbox sound recognition using an automatic speech recognition toolkit. (May 2021)
- Main Title:
- Human beatbox sound recognition using an automatic speech recognition toolkit
- Authors:
- Evain, Solène
Lecouteux, Benjamin
Schwab, Didier
Contesse, Adrien
Pinchaud, Antoine
Henrich Bernardoni, Nathalie - Abstract:
- Highlights: A speech-dedicated automatic recognition tool such as Kaldi can be used for human-beatbox sound recognition. A large vocabulary of human-beatbox sounds can be recognized with low error rate. Recording conditions (type of microphone and settings) do not impact recognition performances. PLP and Fbank features perform worse than than MFCC for beatbox sound recognition system. Abstract: Human beatboxing is a vocal art making use of speech organs to produce vocal drum sounds and imitate musical instruments. Beatbox sound classification is a current challenge that can be used for automatic database annotation and music-information retrieval. In this study, a large-vocabulary human-beatbox sound recognition system was developed with an adaptation of Kaldi toolbox, a widely-used tool for automatic speech recognition. The corpus consisted of eighty boxemes, which were recorded repeatedly by two beatboxers. The sounds were annotated and transcribed to the system by means of a beatbox specific morphographic writing system (Vocal Grammatics). The recognition-system robustness to recording conditions was assessed on recordings of six different microphones and settings. The decoding part was made with monophone acoustic models trained with a classical HMM-GMM model. A change of acoustic features (MFCC, PLP, Fbank) and a variation of different parameters of the beatbox recognition system were tested: (i) the number of HMM states, (ii) the number of MFCC, (iii) the presence orHighlights: A speech-dedicated automatic recognition tool such as Kaldi can be used for human-beatbox sound recognition. A large vocabulary of human-beatbox sounds can be recognized with low error rate. Recording conditions (type of microphone and settings) do not impact recognition performances. PLP and Fbank features perform worse than than MFCC for beatbox sound recognition system. Abstract: Human beatboxing is a vocal art making use of speech organs to produce vocal drum sounds and imitate musical instruments. Beatbox sound classification is a current challenge that can be used for automatic database annotation and music-information retrieval. In this study, a large-vocabulary human-beatbox sound recognition system was developed with an adaptation of Kaldi toolbox, a widely-used tool for automatic speech recognition. The corpus consisted of eighty boxemes, which were recorded repeatedly by two beatboxers. The sounds were annotated and transcribed to the system by means of a beatbox specific morphographic writing system (Vocal Grammatics). The recognition-system robustness to recording conditions was assessed on recordings of six different microphones and settings. The decoding part was made with monophone acoustic models trained with a classical HMM-GMM model. A change of acoustic features (MFCC, PLP, Fbank) and a variation of different parameters of the beatbox recognition system were tested: (i) the number of HMM states, (ii) the number of MFCC, (iii) the presence or not of a pause boxeme in right and left contexts in the lexicon and (iv) the rate of silence probability. Our best model was obtained with the addition of a pause in left and right contexts of each boxeme in the lexicon, a 0.8 silence probability, 22 MFCC and three states HMM. Boxeme error rate in such configuration was lowered to 13.65%, and 8.6 boxemes over 10 were well recognized. The recording settings did not greatly affect system performance, apart from recording with closed-cup technique. … (more)
- Is Part Of:
- Biomedical signal processing and control. Volume 67(2021)
- Journal:
- Biomedical signal processing and control
- Issue:
- Volume 67(2021)
- Issue Display:
- Volume 67, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 67
- Issue:
- 2021
- Issue Sort Value:
- 2021-0067-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-05
- Subjects:
- Human beatbox -- Automatic speech recognition -- Kaldi -- Isolated sound recognition
Signal processing -- Periodicals
Biomedical engineering -- Periodicals
Signal Processing, Computer-Assisted -- Periodicals
Image Processing, Computer-Assisted -- Periodicals
Biomedical Engineering -- Periodicals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/17468094 ↗
http://www.elsevier.com/journals ↗
http://www.sciencedirect.com/science?_ob=PublicationURL&_tockey=%23TOC%2329675%232006%23999989998%23626449%23FLA%23&_cdi=29675&_pubType=J&_auth=y&_acct=C000045259&_version=1&_urlVersion=0&_userid=836873&md5=664b5cf9a57fc91971a17faf20c32ec1 ↗ - DOI:
- 10.1016/j.bspc.2021.102468 ↗
- Languages:
- English
- ISSNs:
- 1746-8094
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2087.880400
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24996.xml