A multi-tasking model of speaker-keyword classification for keeping human in the loop of drone-assisted inspection. (January 2023)
- Record Type:
- Journal Article
- Title:
- A multi-tasking model of speaker-keyword classification for keeping human in the loop of drone-assisted inspection. (January 2023)
- Main Title:
- A multi-tasking model of speaker-keyword classification for keeping human in the loop of drone-assisted inspection
- Authors:
- Li, Yu
Parsan, Anisha
Wang, Bill
Dong, Penghao
Yao, Shanshan
Qin, Ruwen - Abstract:
- Abstract: Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the group and easily adapted when the group changes. This paper is motivated to build a multi-tasking deep learning model that possesses a Share–Split–Collaborate architecture. This architecture allows the two classification tasks to share the feature extractor and then split subject-specific and keyword-specific features intertwined in the extracted features through feature projection and collaborative training. A base model for a group of five authorized subjects is trained and tested on the inspection keyword dataset collected by this study. The model achieved a 95.3% or higher mean accuracy in classifying the keywords of any authorized inspectors. Its mean accuracy in speaker classification is 99.2%. Due to the richer keyword representations that the model learns from the pooled training data Adapting the base model to a new inspector requires only a little training data from that inspector Like five utterances per keyword. Using the speaker classification scores for inspector verification can achieve a success rate of at least 93.9% in verifying authorized inspectors and 76.1% in detecting unauthorized ones. Further The paper demonstrates the applicability of the proposedAbstract: Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the group and easily adapted when the group changes. This paper is motivated to build a multi-tasking deep learning model that possesses a Share–Split–Collaborate architecture. This architecture allows the two classification tasks to share the feature extractor and then split subject-specific and keyword-specific features intertwined in the extracted features through feature projection and collaborative training. A base model for a group of five authorized subjects is trained and tested on the inspection keyword dataset collected by this study. The model achieved a 95.3% or higher mean accuracy in classifying the keywords of any authorized inspectors. Its mean accuracy in speaker classification is 99.2%. Due to the richer keyword representations that the model learns from the pooled training data Adapting the base model to a new inspector requires only a little training data from that inspector Like five utterances per keyword. Using the speaker classification scores for inspector verification can achieve a success rate of at least 93.9% in verifying authorized inspectors and 76.1% in detecting unauthorized ones. Further The paper demonstrates the applicability of the proposed model to larger-size groups on a public dataset. This paper provides a solution to addressing challenges facing AI-assisted human–robot interaction Including worker heterogeneity Worker dynamics And job heterogeneity. Highlights: The Share–Split–Collaborate multitask learning architecture is suitable for speaker-keyword classification. Subject-specific and phonetic-specific features intertwined in audio data can be disentangled. Rich keyword representations are learned from multi-subject spoken command data. Small data of new speakers are sufficient for adding new classes to the speaker classifier. Speaker classification scores are also effective for the speaker verification. … (more)
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 117:Part A(2023)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 117:Part A(2023)
- Issue Display:
- Volume 117, Issue 1 (2023)
- Year:
- 2023
- Volume:
- 117
- Issue:
- 1
- Issue Sort Value:
- 2023-0117-0001-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-01
- Subjects:
- Human-in-the-loop -- Human–robot interaction -- Infrastructure inspection -- Keyword classification -- Speaker recognition
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2022.105597 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24675.xml