On the use of blind channel response estimation and a residual neural network to detect physical access attacks to speaker verification systems. (March 2021)
- Record Type:
- Journal Article
- Title:
- On the use of blind channel response estimation and a residual neural network to detect physical access attacks to speaker verification systems. (March 2021)
- Main Title:
- On the use of blind channel response estimation and a residual neural network to detect physical access attacks to speaker verification systems
- Authors:
- Avila, Anderson R.
Alam, Jahangir
Prado, Fabiano O. Costa
O'Shaughnessy, Douglas
Falk, Tiago H. - Abstract:
- Highlights: The use of blind channel response estimation as a new approach for replay attack detection. The proposed method outperformed the baseline systems in two spoofing datasets. Further improvement achieved after combining recent deep learning models. Front- and back-end based on single feature extraction and single neural network classifier. Abstract: Spoofing attacks have been acknowledged as a serious threat to automatic speaker verification (ASV) systems. In this paper, we are specifically concerned with replay attack scenarios. As a countermeasure to the problem, we propose a front-end based on the blind estimation of the channel response magnitude and as a back-end a residual neural network. Our hypothesis is that the magnitude response of the channel, obtained by subtracting the log-magnitude spectrum of the observed signal from the prediction of the log-magnitude spectrum average of the observed signal's clean counterpart, will capture the nuances of room ambiences, recordings and playback devices. The performance of these features is investigated on a benchmark back-end, based on a Gaussian mixture model and on a deep neural network classifier. Our experiments are performed on the 2017 and 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof) datasets. The benchmark systems are the same as used in the challenges and are based on constant-Q cepstral coefficients (CQCC) and linear-frequency cepstral coefficients (LFCC) features.Highlights: The use of blind channel response estimation as a new approach for replay attack detection. The proposed method outperformed the baseline systems in two spoofing datasets. Further improvement achieved after combining recent deep learning models. Front- and back-end based on single feature extraction and single neural network classifier. Abstract: Spoofing attacks have been acknowledged as a serious threat to automatic speaker verification (ASV) systems. In this paper, we are specifically concerned with replay attack scenarios. As a countermeasure to the problem, we propose a front-end based on the blind estimation of the channel response magnitude and as a back-end a residual neural network. Our hypothesis is that the magnitude response of the channel, obtained by subtracting the log-magnitude spectrum of the observed signal from the prediction of the log-magnitude spectrum average of the observed signal's clean counterpart, will capture the nuances of room ambiences, recordings and playback devices. The performance of these features is investigated on a benchmark back-end, based on a Gaussian mixture model and on a deep neural network classifier. Our experiments are performed on the 2017 and 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof) datasets. The benchmark systems are the same as used in the challenges and are based on constant-Q cepstral coefficients (CQCC) and linear-frequency cepstral coefficients (LFCC) features. Experimental results on the 2017 dataset show that the proposed method outperforms the two benchmarks, providing equal-error rates (EER) as low as 7.57% and 11.64%, respectively, for the development and evaluation sets. On the ASVspoof 2019 dataset, in turn, the proposed method outperformed the benchmark using a residual neural network as back-end by yielding tandem detection cost function (t-DCF) and EER as low as 0.1086 and 4.26% on the evaluation set. Lastly, an instrumental (objective) quality assessment is performed on the two datasets and the impact of quality variability on spoofing detection accuracy is discussed. … (more)
- Is Part Of:
- Computer speech & language. Volume 66(2021)
- Journal:
- Computer speech & language
- Issue:
- Volume 66(2021)
- Issue Display:
- Volume 66, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 66
- Issue:
- 2021
- Issue Sort Value:
- 2021-0066-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-03
- Subjects:
- Automatic speaker recognition -- Spoofing attacks -- Replay attack -- Channel estimation
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2020.101163 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15413.xml