A code-mixed task-oriented dialog dataset for medical domain. (March 2023)
- Record Type:
- Journal Article
- Title:
- A code-mixed task-oriented dialog dataset for medical domain. (March 2023)
- Main Title:
- A code-mixed task-oriented dialog dataset for medical domain
- Authors:
- Dowlagar, Suman
Mamidi, Radhika - Abstract:
- Abstract: In the healthcare domain, medical and patient interactions form a crucial part of the diagnosis. Initially, the AI models developed for healthcare centered only on monolingual data. However, such models do not cater to the multilingual regions, where most conversations are Code-Mixed. We present the Code-Mixed Medical Task-Oriented Dialog Dataset to facilitate the research and development of Code-Mixed medical dialog systems. We analyzed the dataset using medical, conversational, and linguistic theories. The dataset contains 3005 Telugu–English Code-Mixed dialogs between patients and doctors with 29 k utterances covering ten specializations with an average code-mixing index (CMI) of 33.3%. We manually annotated the conversational dataset with intents and slot labels. We also present baselines to establish benchmarks on the dataset using existing state-of-the-art Natural Language Understanding (NLU) models. We improved the existing baselines using contextual ground truth intent labels and processing the slots as chunks. The data is made publically available. 1 Graphical abstract: Highlights: It is the first corpus containing Code-Mixed medical conversations in the Telugu–English language. The dialogs are taken from real-life conversations. To understand the user's utterance, we annotated the corpus with intents and slots. We ran the baseline NLU models to establish a benchmark on our task-oriented corpus. We also proposed an improvement over the baseline NLU models.
- Is Part Of:
- Computer speech & language. Volume 78(2023)
- Journal:
- Computer speech & language
- Issue:
- Volume 78(2023)
- Issue Display:
- Volume 78, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 78
- Issue:
- 2023
- Issue Sort Value:
- 2023-0078-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-03
- Subjects:
- Code-mixed -- Dialog dataset -- Medical domain -- Task oriented
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101449 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24470.xml