Elementary discourse unit segmentation for Vietnamese texts. (12th April 2022)
- Record Type:
- Journal Article
- Title:
- Elementary discourse unit segmentation for Vietnamese texts. (12th April 2022)
- Main Title:
- Elementary discourse unit segmentation for Vietnamese texts
- Authors:
- Nguyen, Chinh Trong
Nguyen, Dang Tuan - Abstract:
- Elementary discourse unit (EDU) segmentation is an important problem in discourse analysis of text. In Vietnam, we do not have any tool or model official published to solve this problem yet. Therefore, we would like to propose a solution for this problem. Our approach is to apply a sequential labelling method for identifying the beginning of each EDU in a sentence. For sequential labelling method, we use a deep neural network architecture containing a BERT for generating word feature vectors as transfer learning approach and a feed forward neural network for identifying the tag of every word. For building the model, we have automatically built an EDU segmentation dataset from a Vietnamese constituent treebank NIIVTB and used this dataset to fine-tune PhoBERT pretrained model. The results show that our EDU segmentation model has span-based F1 score of 0.8, which is sufficient to be used in practical tasks.
- Is Part Of:
- International journal of intelligent information and database systems. Volume 15:Number 3(2022)
- Journal:
- International journal of intelligent information and database systems
- Issue:
- Volume 15:Number 3(2022)
- Issue Display:
- Volume 15, Issue 3 (2022)
- Year:
- 2022
- Volume:
- 15
- Issue:
- 3
- Issue Sort Value:
- 2022-0015-0003-0000
- Page Start:
- 249
- Page End:
- 266
- Publication Date:
- 2022-04-12
- Subjects:
- EDU segmentation -- sequential labelling -- BERT -- transfer learning
Database management -- Computer programs -- Periodicals
Information retrieval -- Computer programs -- Periodicals
Information storage and retrieval systems -- Computer programs -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Intelligent agents (Computer software) -- Periodicals
006.33 - Journal URLs:
- http://www.inderscience.com/jhome.php?jcode=ijiids ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1751-5858
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 21492.xml