Language modelling for speaker diarization in telephonic interviews. (March 2023)
- Record Type:
- Journal Article
- Title:
- Language modelling for speaker diarization in telephonic interviews. (March 2023)
- Main Title:
- Language modelling for speaker diarization in telephonic interviews
- Authors:
- India, Miquel
Hernando, Javier
Fonollosa, José A.R. - Abstract:
- Abstract: The aim of this paper is to investigate the benefit of combining both language and acoustic modelling for speaker diarization. Although conventional systems only use acoustic features, in some scenarios linguistic data contain high discriminative speaker information, even more reliable than the acoustic ones. In this study we analyze how an appropriate fusion of both kind of features is able to obtain good results in these cases. The proposed system is based on an iterative algorithm where a LSTM network is used as a speaker classifier. The network is fed with character-level word embeddings and a GMM based acoustic score created with the output labels from previous iterations. The presented algorithm has been evaluated in a Call-Center database, which is composed of telephone interview audios. The combination of acoustic features and linguistic content shows a 84.29% improvement in terms of a word-level DER as compared to a HMM/VB baseline system. The results of this study confirms that linguistic content can be efficiently used for some speaker recognition tasks. Highlights: Linguistic content can be efficiently combined with acoustic features for the speaker diarization task. Given a specific scenario, speaker diarization can be solved with only linguistic content. Acoustic features lead to a better diarizatcon performance in comparison with linguistic content in large speaker segments. Linguistic content is more discriminative than acoustic features to identifyAbstract: The aim of this paper is to investigate the benefit of combining both language and acoustic modelling for speaker diarization. Although conventional systems only use acoustic features, in some scenarios linguistic data contain high discriminative speaker information, even more reliable than the acoustic ones. In this study we analyze how an appropriate fusion of both kind of features is able to obtain good results in these cases. The proposed system is based on an iterative algorithm where a LSTM network is used as a speaker classifier. The network is fed with character-level word embeddings and a GMM based acoustic score created with the output labels from previous iterations. The presented algorithm has been evaluated in a Call-Center database, which is composed of telephone interview audios. The combination of acoustic features and linguistic content shows a 84.29% improvement in terms of a word-level DER as compared to a HMM/VB baseline system. The results of this study confirms that linguistic content can be efficiently used for some speaker recognition tasks. Highlights: Linguistic content can be efficiently combined with acoustic features for the speaker diarization task. Given a specific scenario, speaker diarization can be solved with only linguistic content. Acoustic features lead to a better diarizatcon performance in comparison with linguistic content in large speaker segments. Linguistic content is more discriminative than acoustic features to identify speakers in short speech segments. … (more)
- Is Part Of:
- Computer speech & language. Volume 78(2023)
- Journal:
- Computer speech & language
- Issue:
- Volume 78(2023)
- Issue Display:
- Volume 78, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 78
- Issue:
- 2023
- Issue Sort Value:
- 2023-0078-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-03
- Subjects:
- Speaker diarization -- Language modelling -- Acoustic modelling -- LSTM neural networks
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101441 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24470.xml