Speaker identification based on Radon transform and CNNs in the presence of different types of interference for Robotic Applications. (June 2021)
- Record Type:
- Journal Article
- Title:
- Speaker identification based on Radon transform and CNNs in the presence of different types of interference for Robotic Applications. (June 2021)
- Main Title:
- Speaker identification based on Radon transform and CNNs in the presence of different types of interference for Robotic Applications
- Authors:
- Shafik, Amira
Sedik, Ahmed
El-Rahiem, Basma Abd
El-Rabaie, El-Sayed M.
Banby, Ghada M. El
El-Samie, Fathi E. Abd
Khalaf, Ashraf A.M.
Song, Oh-Young
Iliyasu, Abdullah M. - Abstract:
- Abstract: Both automatic speaker identification (ASI) and speech recognition can be utlized now for the control of modern robots. An ASI algorithm can be implemented at a speech interface of the robot to determine the identity of the person allowed to deal with the robot, while speech recognition can be implemented for the interpretation of the order given to the robot. Robustness of the ASI system is a challenging task in the presence of speech degradations such as noise and interference. This study presents a new approach to improve the accuracy of speaker identification in the presence of interference for robot control applications with a convolutional neural network (CNN). First, the speech signal from the speaker is divided into segments, each of which is transformed into a spectrogram, and hence Radon transformation is estimated for this spectrogram. The spectrogram resolves the speech segment into a map of power distribution with both time and frequency. Together, the spectrograms and their Radon transforms are used as inputs to a proposed CNN-based deep learning model. Necessary refinements are undertaken and the resulting optimized "Radon-Deep-Learning Model (RDLM) is compared with a benchmark model. The proposed model consists of six convolutional (CNV) layers followed by six Max. pooling layers, while the benchmark model consists of three CNV layers followed by three Max. pooling layers. Experimental results reveal that the proposed RDLM model achieves a highAbstract: Both automatic speaker identification (ASI) and speech recognition can be utlized now for the control of modern robots. An ASI algorithm can be implemented at a speech interface of the robot to determine the identity of the person allowed to deal with the robot, while speech recognition can be implemented for the interpretation of the order given to the robot. Robustness of the ASI system is a challenging task in the presence of speech degradations such as noise and interference. This study presents a new approach to improve the accuracy of speaker identification in the presence of interference for robot control applications with a convolutional neural network (CNN). First, the speech signal from the speaker is divided into segments, each of which is transformed into a spectrogram, and hence Radon transformation is estimated for this spectrogram. The spectrogram resolves the speech segment into a map of power distribution with both time and frequency. Together, the spectrograms and their Radon transforms are used as inputs to a proposed CNN-based deep learning model. Necessary refinements are undertaken and the resulting optimized "Radon-Deep-Learning Model (RDLM) is compared with a benchmark model. The proposed model consists of six convolutional (CNV) layers followed by six Max. pooling layers, while the benchmark model consists of three CNV layers followed by three Max. pooling layers. Experimental results reveal that the proposed RDLM model achieves a high classification accuracy up to 97.5%, which is more than double the performance reported for some traditional methods that are used for speaker identification. … (more)
- Is Part Of:
- Applied acoustics. Volume 177(2021)
- Journal:
- Applied acoustics
- Issue:
- Volume 177(2021)
- Issue Display:
- Volume 177, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 177
- Issue:
- 2021
- Issue Sort Value:
- 2021-0177-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-06
- Subjects:
- Robust speaker identification -- Deep learning -- Convolution neural network -- Radon transform
Acoustical engineering -- Periodicals
Periodicals
620.2 - Journal URLs:
- http://www.sciencedirect.com/science/journal/0003682X ↗
http://www.elsevier.com/journals ↗
http://www.elsevier.com/homepage/elecserv.htt ↗ - DOI:
- 10.1016/j.apacoust.2020.107665 ↗
- Languages:
- English
- ISSNs:
- 0003-682X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 1571.400000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16069.xml