Forensic speaker recognition: A new method based on extracting accent and language information from short utterances. (September 2020)
- Record Type:
- Journal Article
- Title:
- Forensic speaker recognition: A new method based on extracting accent and language information from short utterances. (September 2020)
- Main Title:
- Forensic speaker recognition: A new method based on extracting accent and language information from short utterances
- Authors:
- Saleem, Sajid
Subhan, Fazli
Naseer, Noman
Bais, Abdul
Imtiaz, Ammara - Abstract:
- Abstract: This paper presents a new method for Forensic Speaker Recognition (FSR). The new method is based on extracting accent and language information from short utterances. Accent Classification (AC) and Language Identification (LI) play important role in the identification of people of different groups, communities and origins due to different speaking styles and native languages. In a multilingual society, the forensic experts use AC and LI to reduce search space for suspect recognition to regional and ethnic groups. In this paper, we use different baseline and deep learning methods to automate this process. The baseline methods used are Gaussian Mixture Model-Universal Background Model (GMM-UBM), i-vector and Gaussian Mixture Model-Support Vector Machine (GMM-SVM). The Mel-Frequency Cepstral Coefficients (MFCC) are used as speech features in the baseline methods. The deep learning methods used are Convolutional Neural Network (CNN) and Deep Neural Network (DNN). The recently proposed CNN based methods like VGGVox and GMM-CNN are used. VGGVox and GMM-CNN use speech spectrograms. In case of DNN, x-vectors method is used, which is based on DNN embedding. The experimental results show that GMM-SVM demonstrates better FSR performance compared to GMM-UBM and i-vector methods. Whereas, x-vectors method performs better than GMM-CNN and VGGVox methods. It also performs better than GMM-SVM method. The experimental results show that x-vectors method demonstrates 80.4% FSRAbstract: This paper presents a new method for Forensic Speaker Recognition (FSR). The new method is based on extracting accent and language information from short utterances. Accent Classification (AC) and Language Identification (LI) play important role in the identification of people of different groups, communities and origins due to different speaking styles and native languages. In a multilingual society, the forensic experts use AC and LI to reduce search space for suspect recognition to regional and ethnic groups. In this paper, we use different baseline and deep learning methods to automate this process. The baseline methods used are Gaussian Mixture Model-Universal Background Model (GMM-UBM), i-vector and Gaussian Mixture Model-Support Vector Machine (GMM-SVM). The Mel-Frequency Cepstral Coefficients (MFCC) are used as speech features in the baseline methods. The deep learning methods used are Convolutional Neural Network (CNN) and Deep Neural Network (DNN). The recently proposed CNN based methods like VGGVox and GMM-CNN are used. VGGVox and GMM-CNN use speech spectrograms. In case of DNN, x-vectors method is used, which is based on DNN embedding. The experimental results show that GMM-SVM demonstrates better FSR performance compared to GMM-UBM and i-vector methods. Whereas, x-vectors method performs better than GMM-CNN and VGGVox methods. It also performs better than GMM-SVM method. The experimental results show that x-vectors method demonstrates 80.4% FSR accuracy. With AC, it achieves 85.4% accuracy. With LI, its accuracy is 90.2%. Whereas by combining AC and LI it obtains 95.1% accuracy. This shows that the proposed method based on AC and LI gives promising results. Highlights: Classification of accent and language information play vital role in Forensic Speaker Recognition (FSR). A new method is proposed for FSR based on accent and language information. The proposed method is compared with six different baseline and deep learning methods. The experimental results show that the proposed method gives promising FSR results. … (more)
- Is Part Of:
- Forensic science international. Volume 34(2020)
- Journal:
- Forensic science international
- Issue:
- Volume 34(2020)
- Issue Display:
- Volume 34, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 34
- Issue:
- 2020
- Issue Sort Value:
- 2020-0034-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-09
- Subjects:
- Accent -- Language -- Forensic speaker recognition -- Deep learning -- Speech features
- Journal URLs:
- http://www.sciencedirect.com/ ↗
- DOI:
- 10.1016/j.fsidi.2020.300982 ↗
- Languages:
- English
- ISSNs:
- 2666-2817
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19409.xml