Reversible speaker de-identification using pre-trained transformation functions. (November 2017)
- Record Type:
- Journal Article
- Title:
- Reversible speaker de-identification using pre-trained transformation functions. (November 2017)
- Main Title:
- Reversible speaker de-identification using pre-trained transformation functions
- Authors:
- Magariños, Carmen
Lopez-Otero, Paula
Docio-Fernandez, Laura
Rodriguez-Banga, Eduardo
Erro, Daniel
Garcia-Mateo, Carmen - Abstract:
- Highlights: A speaker de-identification method based on pre-trained transformations is proposed. We overcome the need for a parallel corpus between input and target speakers. Objective and subjective evaluations prove the validity of the proposed approach. This de-identification method achieves universality, naturalness and reversibility. Abstract: Speaker de-identification approaches must accomplish three main goals: universality, naturalness and reversibility. The main drawback of the traditional approach to speaker de-identification using voice conversion techniques is its lack of universality, since a parallel corpus between the input and target speakers is necessary to train the conversion parameters. It is possible to make use of a synthetic target to overcome this issue, but this harms the naturalness of the resulting de-identified speech. Hence, a technique is proposed in this paper in which a pool of pre-trained transformations between a set of speakers is used as follows: given a new user to de-identify, its most similar speaker in this set of speakers is chosen as the source speaker, and the speaker that is the most dissimilar to the source speaker is chosen as the target speaker. Speaker similarity is measured using the i-vector paradigm, which is usually employed as an objective measure of speaker de-identification performance, leading to a system with high de-identification accuracy. The transformation method is based on frequency warping and amplitude scaling,Highlights: A speaker de-identification method based on pre-trained transformations is proposed. We overcome the need for a parallel corpus between input and target speakers. Objective and subjective evaluations prove the validity of the proposed approach. This de-identification method achieves universality, naturalness and reversibility. Abstract: Speaker de-identification approaches must accomplish three main goals: universality, naturalness and reversibility. The main drawback of the traditional approach to speaker de-identification using voice conversion techniques is its lack of universality, since a parallel corpus between the input and target speakers is necessary to train the conversion parameters. It is possible to make use of a synthetic target to overcome this issue, but this harms the naturalness of the resulting de-identified speech. Hence, a technique is proposed in this paper in which a pool of pre-trained transformations between a set of speakers is used as follows: given a new user to de-identify, its most similar speaker in this set of speakers is chosen as the source speaker, and the speaker that is the most dissimilar to the source speaker is chosen as the target speaker. Speaker similarity is measured using the i-vector paradigm, which is usually employed as an objective measure of speaker de-identification performance, leading to a system with high de-identification accuracy. The transformation method is based on frequency warping and amplitude scaling, in order to obtain natural sounding speech while masking the identity of the speaker. In addition, compared to other voice conversion approaches, the proposed method is easily reversible. Experiments were conducted on Albayzin database, and performance was evaluated in terms of objective and subjective measures. These results showed a high success when de-identifying speech, as well as a great naturalness of the transformed voices. In addition, when making the transformation parameters available to a trusted holder, it is possible to invert the de-identification procedure, hence recovering the original speaker identity. The computational cost of the proposed approach is small, making it possible to produce de-identified speech in real-time with a high level of naturalness. … (more)
- Is Part Of:
- Computer speech & language. Volume 46(2017)
- Journal:
- Computer speech & language
- Issue:
- Volume 46(2017)
- Issue Display:
- Volume 46, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 46
- Issue:
- 2017
- Issue Sort Value:
- 2017-0046-2017-0000
- Page Start:
- 36
- Page End:
- 52
- Publication Date:
- 2017-11
- Subjects:
- Speaker de-identification -- Voice transformation -- Speaker re-identification -- Frequency warping -- Amplitude scaling -- i-vector
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.05.001 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 4753.xml