Multi-label classification and interactive NLP-based visualization of electric vehicle patent data. (September 2019)
- Record Type:
- Journal Article
- Title:
- Multi-label classification and interactive NLP-based visualization of electric vehicle patent data. (September 2019)
- Main Title:
- Multi-label classification and interactive NLP-based visualization of electric vehicle patent data
- Authors:
- De Clercq, Djavan
Diop, Ndeye-Fatou
Jain, Devina
Tan, Benjamin
Wen, Zongguo - Abstract:
- Abstract: The objectives of this study are to (1) interactively visualize information embedded in patent texts, and (2) train a high-accuracy multi-label classification algorithm capable of classifying patents into multiple cooperative patent classification (CPC) classes. The case study involved metadata and text data of 17, 500 electric vehicle patents. To these ends, the following methodology was applied: First, feature engineering was based on topic extraction from patent texts using latent dirichlet analysis (LDA) and the perplexity metric. Second, the multi-label implementations of the random forest, decision trees, and KNN algorithms were trained on the data in order to predict multiple class labels corresponding to a given electric vehicle patent. The results of this study were promising, with the best scores for performance metrics such as accuracy, precision, recall, f-score, and hamming loss being 0.91, 0.92, 0.74, and 0.02 respectively. The implications of our results are two-fold: firstly, we present the effectiveness of using open-source tools for customized patent analysis pipelines including interactive data visualization and machine learning. Secondly, our results provide a strong basis for automated multi-label patent classification into CPC classes. Highlights: Natural language processing tools applied to patent information extraction and dynamic visualization. Latent dirichlet analysis with perplexity applied to NLP-informed feature engineering.Abstract: The objectives of this study are to (1) interactively visualize information embedded in patent texts, and (2) train a high-accuracy multi-label classification algorithm capable of classifying patents into multiple cooperative patent classification (CPC) classes. The case study involved metadata and text data of 17, 500 electric vehicle patents. To these ends, the following methodology was applied: First, feature engineering was based on topic extraction from patent texts using latent dirichlet analysis (LDA) and the perplexity metric. Second, the multi-label implementations of the random forest, decision trees, and KNN algorithms were trained on the data in order to predict multiple class labels corresponding to a given electric vehicle patent. The results of this study were promising, with the best scores for performance metrics such as accuracy, precision, recall, f-score, and hamming loss being 0.91, 0.92, 0.74, and 0.02 respectively. The implications of our results are two-fold: firstly, we present the effectiveness of using open-source tools for customized patent analysis pipelines including interactive data visualization and machine learning. Secondly, our results provide a strong basis for automated multi-label patent classification into CPC classes. Highlights: Natural language processing tools applied to patent information extraction and dynamic visualization. Latent dirichlet analysis with perplexity applied to NLP-informed feature engineering. Multi-label classification with random forest, decision trees, and KNN machine learning algorithms. … (more)
- Is Part Of:
- World patent information. Volume 58(2019)
- Journal:
- World patent information
- Issue:
- Volume 58(2019)
- Issue Display:
- Volume 58, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 58
- Issue:
- 2019
- Issue Sort Value:
- 2019-0058-2019-0000
- Page Start:
- Page End:
- Publication Date:
- 2019-09
- Subjects:
- Patent literature -- Periodicals
Information storage and retrieval systems -- Patent documentation -- Periodicals
608.05 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01722190 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.wpi.2019.101903 ↗
- Languages:
- English
- ISSNs:
- 0172-2190
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 9356.973000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 11623.xml