Exploring accidental triggers of smart speakers. (May 2022)
- Record Type:
- Journal Article
- Title:
- Exploring accidental triggers of smart speakers. (May 2022)
- Main Title:
- Exploring accidental triggers of smart speakers
- Authors:
- Schönherr, Lea
Golla, Maximilian
Eisenhofer, Thorsten
Wiele, Jan
Kolossa, Dorothea
Holz, Thorsten - Abstract:
- Abstract: Voice assistants like Amazon's Alexa, Google's Assistant, Tencent's Xiaowei, or Apple's Siri, have become the primary (voice) interface in smart speakers that can be found in millions of households. For privacy reasons, these speakers analyze every sound in their environment for their respective wake word like "Alexa, " "Jiǔsì'èr líng, " or "Hey Siri, " before uploading the audio stream to the cloud for further processing. Previous work reported on examples of an inaccurate wake word detection, which can be tricked using similar words or sounds like "cocaine noodles" instead of "OK Google." In this paper, we perform a comprehensive analysis of such accidental triggers, i. e., sounds that should not have triggered the voice assistant, but did. More specifically, we automate the process of finding accidental triggers and measure their prevalence across 11 smart speakers from 8 different manufacturers using everyday media such as TV shows, news, and other kinds of audio datasets. To systematically detect accidental triggers, we describe a method to artificially craft such triggers using a pronouncing dictionary and a weighted, phone-based Levenshtein distance. In total, we have found hundreds of accidental triggers. Moreover, we explore potential gender and language biases and analyze the reproducibility. Finally, we discuss the resulting privacy implications of accidental triggers and explore countermeasures to reduce and limit their impact on users' privacy. ToAbstract: Voice assistants like Amazon's Alexa, Google's Assistant, Tencent's Xiaowei, or Apple's Siri, have become the primary (voice) interface in smart speakers that can be found in millions of households. For privacy reasons, these speakers analyze every sound in their environment for their respective wake word like "Alexa, " "Jiǔsì'èr líng, " or "Hey Siri, " before uploading the audio stream to the cloud for further processing. Previous work reported on examples of an inaccurate wake word detection, which can be tricked using similar words or sounds like "cocaine noodles" instead of "OK Google." In this paper, we perform a comprehensive analysis of such accidental triggers, i. e., sounds that should not have triggered the voice assistant, but did. More specifically, we automate the process of finding accidental triggers and measure their prevalence across 11 smart speakers from 8 different manufacturers using everyday media such as TV shows, news, and other kinds of audio datasets. To systematically detect accidental triggers, we describe a method to artificially craft such triggers using a pronouncing dictionary and a weighted, phone-based Levenshtein distance. In total, we have found hundreds of accidental triggers. Moreover, we explore potential gender and language biases and analyze the reproducibility. Finally, we discuss the resulting privacy implications of accidental triggers and explore countermeasures to reduce and limit their impact on users' privacy. To foster additional research on these sounds that mislead machine learning models, we publish a dataset of more than 350 verified triggers as a research artifact. Graphical abstract: Highlights: Measurement setup to study the prevalence of accidental triggers in smart speakers. Analysis of a diverse set of audio sources, exploration of potential gender and language biases, and reproducibility. A method to synthesize accidental triggers via a pronouncing dictionary and a weighted phone-based distance metric. Analysis of how commercial companies deal with accidental triggers in practice. Discussion of potential countermeasures that can help to reduce the impact of accidental triggers on users' privacy. … (more)
- Is Part Of:
- Computer speech & language. Volume 73(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 73(2022)
- Issue Display:
- Volume 73, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 73
- Issue:
- 2022
- Issue Sort Value:
- 2022-0073-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-05
- Subjects:
- Smart speaker -- Accidental trigger -- Privacy
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2021.101328 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20398.xml