Handling OOVWords in Mandarin Spoken Term Detection with an Hierarchical n‐Gram Language Model. Issue 6 (1st November 2017)
- Record Type:
- Journal Article
- Title:
- Handling OOVWords in Mandarin Spoken Term Detection with an Hierarchical n‐Gram Language Model. Issue 6 (1st November 2017)
- Main Title:
- Handling OOVWords in Mandarin Spoken Term Detection with an Hierarchical n‐Gram Language Model
- Authors:
- Wang, Xuyang
Zhang, Pengyuan
Na, Xingyu
Pan, Jielin
Yan, Yonghong - Abstract:
- Abstract : In this paper, an hierarchical n ‐gram Language model (LM) combining words and characters is explored to improve the detection of Out‐of‐vocabulary (OOV) words in Mandarin Spoken term detection (STD). The hierarchical LM is based on a word‐level LM, with a character‐level LM estimating probabilities of OOV words in a class‐based way. The region containing OOV words in the sentence to be decoded is detected with the help of the word‐level LM and the probabilities of OOV words are derived from the character‐level LM. The implementation of the proposed approach is based on a dynamic decoder. The proposed approach is evaluated in terms of Actual term weighted value (ATWV) on two Mandarin data sets. Experiment results show that more than 10% relative improvement for OOV word detection is achieved on both sets. In addition, the detection of In‐vocabulary (IV) words is barely influenced as well.
- Is Part Of:
- Chinese journal of electronics. Volume 26:Issue 6(2017)
- Journal:
- Chinese journal of electronics
- Issue:
- Volume 26:Issue 6(2017)
- Issue Display:
- Volume 26, Issue 6 (2017)
- Year:
- 2017
- Volume:
- 26
- Issue:
- 6
- Issue Sort Value:
- 2017-0026-0006-0000
- Page Start:
- 1239
- Page End:
- 1244
- Publication Date:
- 2017-11-01
- Subjects:
- Spoken term detection (STD) -- Language model (LM) -- Out‐of‐vocabulary (OOV) words.
natural language processing -- probability -- speech recognition -- text analysis -- word processing
OOV word handling -- out‐of‐vocabulary words -- Mandarin spoken term detection -- hierarchical n‐gram language model -- word detection -- character detection -- word‐level LM -- character‐level LM -- dynamic decoder -- actual term weighted value -- ATWV -- Mandarin data sets -- spoken documents -- large vocabulary continuous speech recognition -- LVCSR engine
Electronics -- Periodicals
Electronics -- China -- Periodicals
Electronics
China
Periodicals
621.38105 - Journal URLs:
- https://ietresearch.onlinelibrary.wiley.com/journal/20755597 ↗
http://ieeexplore.ieee.org/servlet/opac?punumber=7479413 ↗
http://ieeexplore.ieee.org/Xplore/home.jsp ↗ - DOI:
- 10.1049/cje.2017.07.004 ↗
- Languages:
- English
- ISSNs:
- 1022-4653
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3180.317180
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16443.xml