A co‐training‐based approach for the hierarchical multi‐label classification of research papers. Issue 4 (24th August 2020)
- Record Type:
- Journal Article
- Title:
- A co‐training‐based approach for the hierarchical multi‐label classification of research papers. Issue 4 (24th August 2020)
- Main Title:
- A co‐training‐based approach for the hierarchical multi‐label classification of research papers
- Authors:
- Masmoudi, Abir
Bellaaj, Hatem
Drira, Khalil
Jmaiel, Mohamed - Other Names:
- Chakraborty Tanmoy guestEditor.
Bhatia Sumit guestEditor.
Caragea Cornelia guestEditor.
Moreira Fernando guestEditor.
Rocha Álvaro guestEditor.
Dubey Ashwani Kumar guestEditor. - Abstract:
- Abstract: This paper focuses on the problem of the hierarchical multi‐label classification of research papers, which is the task of assigning the set of relevant labels for a paper from a hierarchy, using reduced amounts of labelled training data. Specifically, we study leveraging unlabelled data, which are usually plentiful and easy to collect, in addition to the few available labelled ones in a semi‐supervised learning framework for achieving better performance results. Thus, in this paper, we propose a semi‐supervised approach for the hierarchical multi‐label classification task of research papers based on the well‐known Co‐training algorithm, which exploit content and bibliographic coupling information as two distinct papers' views. In our approach, two hierarchical multi‐label classifiers, are learnt on different views of the labelled data, and iteratively select their most confident unlabelled samples, which are further added to the labelled set. The success of our suggested Co‐training‐based approach lies in two main components. The first is the use of two suggested selection criteria (i.e., Maximum Agreement and Labels Cardinality Consistency) that enforce selecting confident unlabelled samples. The second is the appliance of an oversampling method that rebalances the labels distribution of the initial labelled set, which reduces the reinforcement of the label imbalance issue during the Co‐training learning. The proposed approach is evaluated using a collection ofAbstract: This paper focuses on the problem of the hierarchical multi‐label classification of research papers, which is the task of assigning the set of relevant labels for a paper from a hierarchy, using reduced amounts of labelled training data. Specifically, we study leveraging unlabelled data, which are usually plentiful and easy to collect, in addition to the few available labelled ones in a semi‐supervised learning framework for achieving better performance results. Thus, in this paper, we propose a semi‐supervised approach for the hierarchical multi‐label classification task of research papers based on the well‐known Co‐training algorithm, which exploit content and bibliographic coupling information as two distinct papers' views. In our approach, two hierarchical multi‐label classifiers, are learnt on different views of the labelled data, and iteratively select their most confident unlabelled samples, which are further added to the labelled set. The success of our suggested Co‐training‐based approach lies in two main components. The first is the use of two suggested selection criteria (i.e., Maximum Agreement and Labels Cardinality Consistency) that enforce selecting confident unlabelled samples. The second is the appliance of an oversampling method that rebalances the labels distribution of the initial labelled set, which reduces the reinforcement of the label imbalance issue during the Co‐training learning. The proposed approach is evaluated using a collection of scientific papers extracted from the ACM digital library. Performed experiments show the effectiveness of our approach with regards to several baseline methods. … (more)
- Is Part Of:
- Expert systems. Volume 38:Issue 4(2021)
- Journal:
- Expert systems
- Issue:
- Volume 38:Issue 4(2021)
- Issue Display:
- Volume 38, Issue 4 (2021)
- Year:
- 2021
- Volume:
- 38
- Issue:
- 4
- Issue Sort Value:
- 2021-0038-0004-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2020-08-24
- Subjects:
- co‐training -- hierarchical multi‐label classification -- imbalanced data -- research papers classification -- semi‐supervised learning
Expert systems (Computer science)
006.33 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1468-0394 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/exsy.12613 ↗
- Languages:
- English
- ISSNs:
- 0266-4720
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 18235.xml