Replay spoof detection using energy separation based instantaneous frequency estimation from quadrature and in-phase components. (January 2023)
- Record Type:
- Journal Article
- Title:
- Replay spoof detection using energy separation based instantaneous frequency estimation from quadrature and in-phase components. (January 2023)
- Main Title:
- Replay spoof detection using energy separation based instantaneous frequency estimation from quadrature and in-phase components
- Authors:
- Gupta, Priyanka
Chodingala, Piyushkumar K.
Patil, Hemant A. - Abstract:
- Abstract: Replay attacks in speech are becoming easier to mount with the advent of high quality of recording and playback devices. This makes these replay attacks a major concern for the security of Automatic Speaker Verification (ASV) systems and voice assistants. In the past, auditory transform-based as well as Instantaneous Frequency (IF)-based features have been proposed for replay spoofed speech detection (SSD). In this context, IF has been estimated either by derivative of analytic phase via Hilbert transform, or by using high temporal resolution Teager Energy Operator (TEO)-based Energy Separation Algorithm (ESA). However, excellent temporal resolution of ESA comes with lacking in using relative phase information and vice-versa. To that effect, we propose novel Cochlear Filter Cepstral Coefficients-based Instantaneous Frequency using Quadrature Energy Separation Algorithm (CFCCIF-QESA) features, with excellent temporal resolution as well as relative phase information. CFCCIF-QESA is designed by exploiting relative phase shift to estimate IF, without estimating phase explicitly from the signal. To motivate and validate effectiveness of proposed QESA approach for IF estimation, we have employed information-theoretic measures, such as Mutual Information (MI), Kullback–Leibler (KL) divergence, and Jensen–Shannon (JS) divergence. The proposed CFCCIF-QESA feature set is extensively evaluated on standard statistically meaningful ASVSpoof 2017 version2.0 dataset. WhenAbstract: Replay attacks in speech are becoming easier to mount with the advent of high quality of recording and playback devices. This makes these replay attacks a major concern for the security of Automatic Speaker Verification (ASV) systems and voice assistants. In the past, auditory transform-based as well as Instantaneous Frequency (IF)-based features have been proposed for replay spoofed speech detection (SSD). In this context, IF has been estimated either by derivative of analytic phase via Hilbert transform, or by using high temporal resolution Teager Energy Operator (TEO)-based Energy Separation Algorithm (ESA). However, excellent temporal resolution of ESA comes with lacking in using relative phase information and vice-versa. To that effect, we propose novel Cochlear Filter Cepstral Coefficients-based Instantaneous Frequency using Quadrature Energy Separation Algorithm (CFCCIF-QESA) features, with excellent temporal resolution as well as relative phase information. CFCCIF-QESA is designed by exploiting relative phase shift to estimate IF, without estimating phase explicitly from the signal. To motivate and validate effectiveness of proposed QESA approach for IF estimation, we have employed information-theoretic measures, such as Mutual Information (MI), Kullback–Leibler (KL) divergence, and Jensen–Shannon (JS) divergence. The proposed CFCCIF-QESA feature set is extensively evaluated on standard statistically meaningful ASVSpoof 2017 version2.0 dataset. When evaluated on the ASVSpoof 2017 v2.0 dataset, CFCCIF-QESA achieves improved performance as compared to CFCCIF-ESA and CQCC feature sets on GMM, CNN, and LCNN classifiers. Furthermore, in the case of cross-database evaluation using ASVSpoof 2017 v2.0 and VSDC, CFCCIF-QESA also performs relatively better as compared to CFCCIF-ESA and CQCC on GMM classifier. However, for the case of self-classification on the ASVSpoof 2019 PA data, CFCCIF-QESA only outperforms CFCCIF-ESA. Whereas, on BTAS 2016 dataset, it performs relatively close to CFCCIF-ESA. Finally, results are presented for the case when the ASV system is not under attack. Highlights: Mutual Information (MI)-based analysis is shown to justify the choice of quadrature phase component, without estimating phase explicitly. The extended definition of TEO for complex-valued signal is exploited for the first time for IF estimation using QESA for SSD task. Cross-dataset validation is shown on two datasets, namely, ASVSpoof 2017 v2.0, and Voice Spoofing Detection Corpus (VSDC). The VSDC dataset is a less explored dataset, that incorporates the new and additional acoustic scenario of 2-point replay (2PR) which is absent in the ASVSpoof datasets. We employed informative-theoretic model-level measures, namely, Kullback–Leibler (KL) divergence and Jensen–Shannon (JS) divergence to show the effectiveness of proposed CFCCIF-QESA as compared to CFCCIF-ESA, across various number of mixtures in GMM classifier. To present generalizability, experiments are performed across various databases, namely, ASVSpoof 2017 v2.0, ASVSpoof 2019, BTAS 2016, VSDC (1PR), and VSDC (2PR). … (more)
- Is Part Of:
- Computer speech & language. Volume 77(2023)
- Journal:
- Computer speech & language
- Issue:
- Volume 77(2023)
- Issue Display:
- Volume 77, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 77
- Issue:
- 2023
- Issue Sort Value:
- 2023-0077-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-01
- Subjects:
- Automatic Speaker Verification (ASV) -- Replay attack -- Instantaneous Frequency (IF) -- Quadrature Energy Separation Algorithm (QESA)
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101423 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 23405.xml