A hybrid approach to classifying Wikipedia article quality flaws with feature fusion framework. (1st November 2021)
- Record Type:
- Journal Article
- Title:
- A hybrid approach to classifying Wikipedia article quality flaws with feature fusion framework. (1st November 2021)
- Main Title:
- A hybrid approach to classifying Wikipedia article quality flaws with feature fusion framework
- Authors:
- Wang, Ping
Li, Muyan
Li, Xiaodan
Zhou, Heshen
Hou, Jingrui - Abstract:
- Highlights: Combine pretrained models with deep learning for quality flaw classification. Manually designed features and automatically extracted features are combined. Proposed method achieves notably improved precision, recall, and accuracy. Provide a best practice for feature selection of quality flaw classification. Abstract: Article quality has always been a major concern for Wikipedia. To improve article quality, it is critical to first identify defects. Thus, flaw classification has attracted considerable attention. To achieve this, several machine-learning-based approaches are available, including deep learning models based on either manually constructed or autoextracted features. However, adopting only features of either single type may not ensure a comprehensive description of articles. To improve flaw classification, we propose a feature fusion framework combining both handcrafted and autoextracted features. In this research, we first use a rule-based method from a previously proposed framework to extract handcrafted features. Additionally, we obtain autoextracted features using Bidirectional Encoder Representations from Transformers (BERT) and various deep learning models, including bidirectional long short-term memory (Bi LSTM), bidirectional gated recurrent unit (Bi GRU), bidirectional recurrent neural network (Bi RNN), and multihead self-attention models. Finally, the handcrafted features are standardized and concatenated with the autoextracted features. Then,Highlights: Combine pretrained models with deep learning for quality flaw classification. Manually designed features and automatically extracted features are combined. Proposed method achieves notably improved precision, recall, and accuracy. Provide a best practice for feature selection of quality flaw classification. Abstract: Article quality has always been a major concern for Wikipedia. To improve article quality, it is critical to first identify defects. Thus, flaw classification has attracted considerable attention. To achieve this, several machine-learning-based approaches are available, including deep learning models based on either manually constructed or autoextracted features. However, adopting only features of either single type may not ensure a comprehensive description of articles. To improve flaw classification, we propose a feature fusion framework combining both handcrafted and autoextracted features. In this research, we first use a rule-based method from a previously proposed framework to extract handcrafted features. Additionally, we obtain autoextracted features using Bidirectional Encoder Representations from Transformers (BERT) and various deep learning models, including bidirectional long short-term memory (Bi LSTM), bidirectional gated recurrent unit (Bi GRU), bidirectional recurrent neural network (Bi RNN), and multihead self-attention models. Finally, the handcrafted features are standardized and concatenated with the autoextracted features. Then, the concatenated features are fed into a feedforward neural network for classification. A detailed comparison of different classifiers is conducted. We compare 12 different classifiers in terms of training performance, classification performance, and model training time. The experiments show that the proposed feature fusion framework can notably improve the effectiveness of quality flaw classification for Wikipedia articles. In particular, a Bi GRU model based on the proposed framework achieves excellent classification accuracy. … (more)
- Is Part Of:
- Expert systems with applications. Volume 181(2021)
- Journal:
- Expert systems with applications
- Issue:
- Volume 181(2021)
- Issue Display:
- Volume 181, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 181
- Issue:
- 2021
- Issue Sort Value:
- 2021-0181-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-11-01
- Subjects:
- Quality flaw -- Deep learning -- Fusion framework -- Text classification
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2021.115089 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 18252.xml