Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection. (March 2022)
- Record Type:
- Journal Article
- Title:
- Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection. (March 2022)
- Main Title:
- Improving the potential of Enhanced Teager Energy Cepstral Coefficients (ETECC) for replay attack detection
- Authors:
- Patil, Ankur T.
Acharya, Rajul
Patil, Hemant A.
Guido, Rodrigo Capobianco - Abstract:
- Abstract: In the scope of voice biometrics, the term replay attack, (RA) refers to the dishonest attempt made by an impostor to spoof someone else's identity by replaying the subject's previously recorded speech close to the Automatic Speaker Verification (ASV) system under attack. State-of-the-art strategies for RA detection, such as the Enhanced Teager Energy Cepstral Coefficients (ETECC), have shown promising results due to their precision in measuring energy from high frequency components of speech, as a function of two recently defined concepts: signal mass and Enhanced Teager Energy Operator (ETEO). Nevertheless, since the replay mechanism prominently deteriorates the speech signal spectrum just in those spectral zones, we propose the association of ETEO with different strategies to further improve the previous results in getting effective countermeasures against RAs. Specifically, comprehensive evaluations which include a detailed mathematical analysis, a simulation on amplitude and frequency modulated (AM–FM) signals, and a spectrographic inspection involving different filterbank structures, along with their experimental results, are provided in this paper. In addition, ETEO-derived features are contrasted to existing feature sets by using Paraconsistent Feature Engineering (PFE) for feature ranking, expanding our previously published results. Lastly, experiments are performed with ASVSpoof-2017 version 2.0 dataset, Realistic Replay Attack Microphone Array SpeechAbstract: In the scope of voice biometrics, the term replay attack, (RA) refers to the dishonest attempt made by an impostor to spoof someone else's identity by replaying the subject's previously recorded speech close to the Automatic Speaker Verification (ASV) system under attack. State-of-the-art strategies for RA detection, such as the Enhanced Teager Energy Cepstral Coefficients (ETECC), have shown promising results due to their precision in measuring energy from high frequency components of speech, as a function of two recently defined concepts: signal mass and Enhanced Teager Energy Operator (ETEO). Nevertheless, since the replay mechanism prominently deteriorates the speech signal spectrum just in those spectral zones, we propose the association of ETEO with different strategies to further improve the previous results in getting effective countermeasures against RAs. Specifically, comprehensive evaluations which include a detailed mathematical analysis, a simulation on amplitude and frequency modulated (AM–FM) signals, and a spectrographic inspection involving different filterbank structures, along with their experimental results, are provided in this paper. In addition, ETEO-derived features are contrasted to existing feature sets by using Paraconsistent Feature Engineering (PFE) for feature ranking, expanding our previously published results. Lastly, experiments are performed with ASVSpoof-2017 version 2.0 dataset, Realistic Replay Attack Microphone Array Speech Corpus (ReMASC), BTAS-2016, dataset, ASVSpoof-2019 challenge dataset, and ASVSpoof-2015 challenge dataset, considering Gaussian Mixture Models (GMMs), Convolutional Neural Networks (CNNs) and Light-CNN architectures as being the classifiers. The standalone ETECC-GMM system showed the best performance by producing equal error rates (EERs) of 5.55% and 10.75% on development and evaluation sets, respectively. Highlights: Improved handcrafted feature extraction technique based on ETEO operator is presented, which provides a precise estimate of signal energy by means of the concept of signal mass, to leverage the performance of TEO. We describe an innovative assessment, based on Paraconsistent Feature Engineering (PFE), which measures the efficacy of the proposed ETECC-based feature sets, along with the existing feature sets, for the intended classification task. We comment on the results from the application of state-of-the-art filterbanks, namely Gammatone and Cochlear, in SSD tasks. The experiments were also performed with the Ricker wavelet-based filterbank, i.e., the negative normalized second derivative of the Gaussian, and Gabor filterbank. We performed experiments on environment-dependent scenario on ASVspoof-2017 version 2.0 dataset, Realistic Replay Attack Microphone Array Speech Corpus (ReMASC), BTAS-2016, ASVspoof-2019 challenge dataset, and ASVspoof-2015 challenge dataset. Extended the experiments by considering deep learning architectures, namely, Convolutional Neural Networks (CNNs) and Light-CNN architectures, to work as classifiers in conjunction with the extracted features, and reported the results over ASVspoof-2017 version-2 dataset. … (more)
- Is Part Of:
- Computer speech & language. Volume 72(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 72(2022)
- Issue Display:
- Volume 72, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 72
- Issue:
- 2022
- Issue Sort Value:
- 2022-0072-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-03
- Subjects:
- Handcrafted features -- Automatic speaker verification (ASV) -- Replay attacks (RAs) -- Paraconsistent Feature Engineering (PFE) -- Enhanced Teager Energy Operator (ETEO) -- Enhanced Teager Energy Cepstral Coefficients (ETECCs)
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2021.101281 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20111.xml