A review of speaker diarization: Recent advances with deep learning. (March 2022)
- Record Type:
- Journal Article
- Title:
- A review of speaker diarization: Recent advances with deep learning. (March 2022)
- Main Title:
- A review of speaker diarization: Recent advances with deep learning
- Authors:
- Park, Tae Jin
Kanda, Naoyuki
Dimitriadis, Dimitrios
Han, Kyu J.
Watanabe, Shinji
Narayanan, Shrikanth - Abstract:
- Abstract: Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for speech recognition on multispeaker audio recordings to enable speaker adaptive processing. These algorithms also gained their own value as a standalone application over time to provide speaker-specific metainformation for downstream tasks such as audio retrieval. More recently, with the emergence of deep learning technology, which has driven revolutionary changes in research and practices across speech application domains, rapid advancements have been made for speaker diarization. In this paper, we review not only the historical development of speaker diarization technology but also the recent advancements in neural speaker diarization approaches. Furthermore, we discuss how speaker diarization systems have been integrated with speech recognition applications and how the recent surge of deep learning is leading the way of jointly modeling these two components to be complementary to each other. By considering such exciting technical trends, we believe that this paper is a valuable contribution to the community to provide a survey work by consolidating the recent developments with neural methods and thus facilitating further progress toward a more efficient speaker diarization. Highlights: The latest trends and approaches to speakerAbstract: Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for speech recognition on multispeaker audio recordings to enable speaker adaptive processing. These algorithms also gained their own value as a standalone application over time to provide speaker-specific metainformation for downstream tasks such as audio retrieval. More recently, with the emergence of deep learning technology, which has driven revolutionary changes in research and practices across speech application domains, rapid advancements have been made for speaker diarization. In this paper, we review not only the historical development of speaker diarization technology but also the recent advancements in neural speaker diarization approaches. Furthermore, we discuss how speaker diarization systems have been integrated with speech recognition applications and how the recent surge of deep learning is leading the way of jointly modeling these two components to be complementary to each other. By considering such exciting technical trends, we believe that this paper is a valuable contribution to the community to provide a survey work by consolidating the recent developments with neural methods and thus facilitating further progress toward a more efficient speaker diarization. Highlights: The latest trends and approaches to speaker diarization as part of speech interaction applications. Overview of the development of speaker diarization in the era of deep learning. Review of diarization techniques belonging to the proposed taxonomy. Introduction of techniques used in the traditional, modular speaker diarization systems. Recent advancements in joint training approaches and fully end-to-end models. A perspective of how speaker diarization has been investigated in the context of ASR. Review of the challenges, the future of speaker diarization and the applications of speaker diarization. … (more)
- Is Part Of:
- Computer speech & language. Volume 72(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 72(2022)
- Issue Display:
- Volume 72, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 72
- Issue:
- 2022
- Issue Sort Value:
- 2022-0072-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-03
- Subjects:
- Speaker diarization -- Automatic speech recognition -- Deep learning
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2021.101317 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20111.xml