TEM virus images: Benchmark dataset and deep learning classification. (September 2021)
- Record Type:
- Journal Article
- Title:
- TEM virus images: Benchmark dataset and deep learning classification. (September 2021)
- Main Title:
- TEM virus images: Benchmark dataset and deep learning classification
- Authors:
- Matuszewski, Damian J.
Sintorn, Ida-Maria - Abstract:
- Highlights: We publish a new challenging dataset with 1245 TEM images of 22 virus classes. We propose a new baseline classification for this challenging dataset. Our best model, fine-tuned DenseNet201, achieved 93.1% accuracy on the test set. Our custom CNN achieved 90.1% accuracy despite being 10x smaller than DenseNet201. We show the importance of application knowledge in dataset design and interpretation. Abstract: Background and Objective: To achieve the full potential of deep learning (DL) models, such as understanding the interplay between model (size), training strategy, and amount of training data, researchers and developers need access to new dedicated image datasets; i.e., annotated collections of images representing real-world problems with all their variations, complexity, limitations, and noise. Here, we present, describe and make freely available an annotated transmission electron microscopy (TEM) image dataset. It constitutes an interesting challenge for many practical applications in virology and epidemiology; e.g., virus detection, segmentation, classification, and novelty detection. We also present benchmarking results for virus detection and recognition using some of the top-performing (large and small) networks as well as a handcrafted very small network. We compare and evaluate transfer learning and training from scratch hypothesizing that with a limited dataset, transfer learning is crucial for good performance of a large network whereas our handcraftedHighlights: We publish a new challenging dataset with 1245 TEM images of 22 virus classes. We propose a new baseline classification for this challenging dataset. Our best model, fine-tuned DenseNet201, achieved 93.1% accuracy on the test set. Our custom CNN achieved 90.1% accuracy despite being 10x smaller than DenseNet201. We show the importance of application knowledge in dataset design and interpretation. Abstract: Background and Objective: To achieve the full potential of deep learning (DL) models, such as understanding the interplay between model (size), training strategy, and amount of training data, researchers and developers need access to new dedicated image datasets; i.e., annotated collections of images representing real-world problems with all their variations, complexity, limitations, and noise. Here, we present, describe and make freely available an annotated transmission electron microscopy (TEM) image dataset. It constitutes an interesting challenge for many practical applications in virology and epidemiology; e.g., virus detection, segmentation, classification, and novelty detection. We also present benchmarking results for virus detection and recognition using some of the top-performing (large and small) networks as well as a handcrafted very small network. We compare and evaluate transfer learning and training from scratch hypothesizing that with a limited dataset, transfer learning is crucial for good performance of a large network whereas our handcrafted small network performs relatively well when training from scratch. This is one step towards understanding how much training data is needed for a given task. Methods: The benchmark dataset contains 1245 images of 22 virus classes. We propose a representative data split into training, validation, and test sets for this dataset. Moreover, we compare different established DL networks and present a baseline DL solution for classifying a subset of the 14 most-represented virus classes in the dataset. Results: Our best model, DenseNet201 pre-trained on ImageNet and fine-tuned on the training set, achieved a 0.921 F1-score and 93.1% accuracy on the proposed representative test set. Conclusions: Public and real biomedical datasets are an important contribution and a necessity to increase the understanding of shortcomings, requirements, and potential improvements for deep learning solutions on biomedical problems or deploying solutions in clinical settings. We compared transfer learning to learning from scratch on this dataset and hypothesize that for limited-sized datasets transfer learning is crucial for achieving good performance for large models. Last but not least, we demonstrate the importance of application knowledge in creating datasets for training DL models and analyzing their results. Graphical abstract: Image, graphical abstract … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 209(2021)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 209(2021)
- Issue Display:
- Volume 209, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 209
- Issue:
- 2021
- Issue Sort Value:
- 2021-0209-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-09
- Subjects:
- CNN -- Convolutional neural networks -- Transmission electron microscopy -- Virus recognition -- Transfer learning -- Dataset curation
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2021.106318 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 18641.xml