Lightweight and irreversible speech pseudonymization based on data-driven optimization of cascaded voice modification modules. (March 2022)
- Record Type:
- Journal Article
- Title:
- Lightweight and irreversible speech pseudonymization based on data-driven optimization of cascaded voice modification modules. (March 2022)
- Main Title:
- Lightweight and irreversible speech pseudonymization based on data-driven optimization of cascaded voice modification modules
- Authors:
- Kai, Hiroto
Takamichi, Shinnosuke
Shiota, Sayaka
Kiya, Hitoshi - Abstract:
- Abstract: In this paper, we propose a speech pseudonymization framework that utilizes cascaded and superposition-based voice modification modules. With increasing opportunities to use spoken dialogue systems nowadays, research regarding protecting the privacy of speaker information encapsulated in speech data is attracting attention. Pseudonymization, which is one method for voice privacy protection, aims to keep the intelligibility of speech while simultaneously suppressing speaker-specific information. One motivation of our framework is to achieve a reliable pseudonymization performance with light computation. To do this, we utilize the advantages of both machine learning-based and signal processing-based approaches. The advantages are (1) using signal processing-based methods parameterized with few hyperparameters and (2) using machine learning-based optimization to optimize all hyperparameters on the basis of black-box systems consisting of automatic speaker verification and automatic speech recognition. Our method of cascading signal processing modules, which are jointly optimized in a data-driven manner, can pseudonymize speech in a lightweight manner. Additionally, we discuss irreversible pseudonymization approaches and propose a superposition approach, yet another pseudonymization method that is more irreversible than the cascade method in terms of estimating the adequate parameters to recover the original signal. From the experimental results conducted under theAbstract: In this paper, we propose a speech pseudonymization framework that utilizes cascaded and superposition-based voice modification modules. With increasing opportunities to use spoken dialogue systems nowadays, research regarding protecting the privacy of speaker information encapsulated in speech data is attracting attention. Pseudonymization, which is one method for voice privacy protection, aims to keep the intelligibility of speech while simultaneously suppressing speaker-specific information. One motivation of our framework is to achieve a reliable pseudonymization performance with light computation. To do this, we utilize the advantages of both machine learning-based and signal processing-based approaches. The advantages are (1) using signal processing-based methods parameterized with few hyperparameters and (2) using machine learning-based optimization to optimize all hyperparameters on the basis of black-box systems consisting of automatic speaker verification and automatic speech recognition. Our method of cascading signal processing modules, which are jointly optimized in a data-driven manner, can pseudonymize speech in a lightweight manner. Additionally, we discuss irreversible pseudonymization approaches and propose a superposition approach, yet another pseudonymization method that is more irreversible than the cascade method in terms of estimating the adequate parameters to recover the original signal. From the experimental results conducted under the VoicePrivacy 2020 protocols, we can demonstrate that (1) our cascade method succeeds in deteriorating the speaker recognition rate by over 24% while simultaneously improving the speech recognition rate by approximately 8% compared with a signal processing-based baseline system of VoicePrivacy 2020 and that (2) our superposition method works comparable to our cascade method in terms of pseudonymization performance. Graphical abstract: Highlights: Lightweight and irreversible speech pseudonymization for protecting voice privacy. Cascade or superposition of signal processing-based voice modification modules. Parameter optimization of hyperparameters in a machine learning-based manner. Advantages of signal processing and machine learning lead to effective performance. The importance of irreversibility along with the proposal of an irreversible method. … (more)
- Is Part Of:
- Computer speech & language. Volume 72(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 72(2022)
- Issue Display:
- Volume 72, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 72
- Issue:
- 2022
- Issue Sort Value:
- 2022-0072-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-03
- Subjects:
- Speech pseudonymization -- Voice privacy protection -- Data-driven optimization -- Cascaded voice modification -- Irreversibility
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2021.101315 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25348.xml