Synthetic speech detection using fundamental frequency variation and spectral features. (March 2018)
- Record Type:
- Journal Article
- Title:
- Synthetic speech detection using fundamental frequency variation and spectral features. (March 2018)
- Main Title:
- Synthetic speech detection using fundamental frequency variation and spectral features
- Authors:
- Pal, Monisankha
Paul, Dipjyoti
Saha, Goutam - Abstract:
- Highlights: Proposed synthetic speech detection using score fusion of CQCC, APGDF and fundamental frequency variation (FFV) features. Best spoofing detection performance on the ASVspoof 2015 evaluation dataset with an overall EER of 0.05%. Produced the state-of-the-art performance for ASV integrated with countermeasure framework. Superior performance in generalization ability assessment. Abstract: Recent works on the vulnerability of automatic speaker verification (ASV) systems confirm that malicious spoofing attacks using synthetic speech can provoke significant increase in false acceptance rate. A reliable detection of synthetic speech is key to develop countermeasure for synthetic speech based spoofing attacks. In this paper, we targeted that by focusing on three major types of artifacts related to magnitude, phase and pitch variation, which are introduced during the generation of synthetic speech. We proposed a new approach to detect synthetic speech using score-level fusion of front-end features namely, constant Q cepstral coefficients (CQCCs), all-pole group delay function (APGDF) and fundamental frequency variation (FFV). CQCC and APGDF were individually used earlier for spoofing detection task and yielded the best performance among magnitude and phase spectrum related features, respectively. The novel FFV feature introduced in this paper to extract pitch variation at frame-level, provides complementary information to CQCC and APGDF. Experimental results show that theHighlights: Proposed synthetic speech detection using score fusion of CQCC, APGDF and fundamental frequency variation (FFV) features. Best spoofing detection performance on the ASVspoof 2015 evaluation dataset with an overall EER of 0.05%. Produced the state-of-the-art performance for ASV integrated with countermeasure framework. Superior performance in generalization ability assessment. Abstract: Recent works on the vulnerability of automatic speaker verification (ASV) systems confirm that malicious spoofing attacks using synthetic speech can provoke significant increase in false acceptance rate. A reliable detection of synthetic speech is key to develop countermeasure for synthetic speech based spoofing attacks. In this paper, we targeted that by focusing on three major types of artifacts related to magnitude, phase and pitch variation, which are introduced during the generation of synthetic speech. We proposed a new approach to detect synthetic speech using score-level fusion of front-end features namely, constant Q cepstral coefficients (CQCCs), all-pole group delay function (APGDF) and fundamental frequency variation (FFV). CQCC and APGDF were individually used earlier for spoofing detection task and yielded the best performance among magnitude and phase spectrum related features, respectively. The novel FFV feature introduced in this paper to extract pitch variation at frame-level, provides complementary information to CQCC and APGDF. Experimental results show that the proposed approach produces the best stand-alone spoofing detection performance using Gaussian mixture model (GMM) based classifier on ASVspoof 2015 evaluation dataset. An overall equal error rate of 0.05% with a relative performance improvement of 76.19% over the next best-reported results is obtained using the proposed method. In addition to outperforming all existing baseline features for both known and unknown attacks, the proposed feature combination yields superior performance for ASV system (GMM with universal background model/i-vector) integrated with countermeasure framework. Further, the proposed method is found to have relatively better generalization ability when either one or both of copy-synthesized data and limited spoofing data are available a priori in the training pool. … (more)
- Is Part Of:
- Computer speech & language. Volume 48(2018)
- Journal:
- Computer speech & language
- Issue:
- Volume 48(2018)
- Issue Display:
- Volume 48, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 48
- Issue:
- 2018
- Issue Sort Value:
- 2018-0048-2018-0000
- Page Start:
- 31
- Page End:
- 50
- Publication Date:
- 2018-03
- Subjects:
- All-pole group delay function (APGDF) -- Anti-spoofing -- Constant Q cepstral coefficient (CQCC) -- Fundamental frequency variation (FFV) -- Score-level fusion -- Spoofing attack
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.10.001 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5454.xml