Detection of replay spoof speech using teager energy feature cues. (January 2021)
- Record Type:
- Journal Article
- Title:
- Detection of replay spoof speech using teager energy feature cues. (January 2021)
- Main Title:
- Detection of replay spoof speech using teager energy feature cues
- Authors:
- Kamble, Madhu R.
Patil, Hemant A. - Abstract:
- Highlights: We analyse the replay speech signal in terms of reverberation that occurs during recording of the replay speech signal. The reverberation introduces delay and change in amplitude producing close copies of natural signal that makes natural components inseparable from the replay components. We propose to exploit the capabilities of Teager Energy Operator (TEO) to compute running estimate of subband energies for replay vs. genuine speech signal. The Teager energy profiles obtained for both genuine and replay (reverberation) signal shows different energy profiles that helps to classify replay signal from its genuine signal. Experiment on the ASVspoof 2017 Challenge version 2.0 shows improved performance compared to baseline features. Abstract: The vulnerability of Automatic Speaker Verification (ASV) systems to spoofing or presentation attacks is still an open security issue. In this context, replay spoofing attacks pose a great threat to an ASV system since they can be easily performed (using a playback device, and without needing any technical skill). In this paper, we analyze replay speech signals in terms of reverberation that may occur during recording of the speech signal. Such reverberation introduces delay and changes in amplitude, producing close copies of speech signals, which significantly influences the replay components. To that effect, we propose to exploit the capabilities of the Teager Energy Operator (TEO) to compute a running estimate of subbandHighlights: We analyse the replay speech signal in terms of reverberation that occurs during recording of the replay speech signal. The reverberation introduces delay and change in amplitude producing close copies of natural signal that makes natural components inseparable from the replay components. We propose to exploit the capabilities of Teager Energy Operator (TEO) to compute running estimate of subband energies for replay vs. genuine speech signal. The Teager energy profiles obtained for both genuine and replay (reverberation) signal shows different energy profiles that helps to classify replay signal from its genuine signal. Experiment on the ASVspoof 2017 Challenge version 2.0 shows improved performance compared to baseline features. Abstract: The vulnerability of Automatic Speaker Verification (ASV) systems to spoofing or presentation attacks is still an open security issue. In this context, replay spoofing attacks pose a great threat to an ASV system since they can be easily performed (using a playback device, and without needing any technical skill). In this paper, we analyze replay speech signals in terms of reverberation that may occur during recording of the speech signal. Such reverberation introduces delay and changes in amplitude, producing close copies of speech signals, which significantly influences the replay components. To that effect, we propose to exploit the capabilities of the Teager Energy Operator (TEO) to compute a running estimate of subband energies for replay vs. genuine signals. We have used a linearly-spaced Gabor filterbank to obtain a narrowband filtered signal. The TEO has the ability to track the instantaneous changes of a signal. Experiments are performed on the ASVspoof 2017 Challenge version 2.0 database using a Gaussian Mixture Model (GMM) as pattern classifier. Furthermore, we compared our results with state-of-the-art feature sets, namely, Constant Q Cepstral Coefficients (CQCC), Linear Frequency Cepstral Coefficients (LFCC), Mel Frequency Cepstral Coefficients (MFCC), and used their score-level fusion with the proposed feature sets, i.e., Teager Energy Cepstral Coefficients (TECC), in order to obtain possible complementary information that further reduces the Equal Error Rate (EER). Relatively low EERs are obtained with score-level fusion of CQCC, MFCC, LFCC, and TECC feature sets, resulted in 6.68% and 10.45% on development and evaluation sets, respectively. Moreover, for the evaluation dataset, we also studied the performance of the TECC feature set on different Replay Configurations (RC), namely, for acoustic environments, playback, and recording devices. For all the levels of threat conditions (i.e., low, medium, and high-level) to an ASV system, the proposed feature set performed better compared to existing state-of-the-art feature sets. In addition to the ASVspoof 2017 Challenge database, we also performed experiments on other spoofing databases, namely, the ASVspoof 2015 Challenge, BTAS 2016, and ASVspoof 2019 Challenge databases. For all the spoofing databases used in this study, the proposed TECC feature set perform significantly better than the other feature sets. … (more)
- Is Part Of:
- Computer speech & language. Volume 65(2021)
- Journal:
- Computer speech & language
- Issue:
- Volume 65(2021)
- Issue Display:
- Volume 65, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 65
- Issue:
- 2021
- Issue Sort Value:
- 2021-0065-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-01
- Subjects:
- Automatic speaker verification -- Spoof -- Replay -- Reverberation -- TEO profile
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2020.101140 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16886.xml