A simplified cluster model and a tool adapted for collaborative labeling of lung cancer CT scans. (July 2021)
- Record Type:
- Journal Article
- Title:
- A simplified cluster model and a tool adapted for collaborative labeling of lung cancer CT scans. (July 2021)
- Main Title:
- A simplified cluster model and a tool adapted for collaborative labeling of lung cancer CT scans
- Authors:
- Morozov, S.P.
Gombolevskiy, V.A.
Elizarov, A.B.
Gusev, M.A.
Novik, V.P.
Prokudaylo, S.B.
Bardin, A.S.
Popov, E.V.
Ledikhova, N.V.
Chernina, V.Y.
Blokhin, I.A.
Nikolaev, A.E.
Reshetnikov, R.V.
Vladzymyrskyy, A.V.
Kulberg, N.S. - Abstract:
- Highlights: We have developed a simplified cluster model and an open-source tool for lung cancer CT scans annotation. The model accelerates the nodule markup process and enhances its efficiency. Using the model, we have created the publicly available CTLungCa-500 dataset of lung CT images. The dataset contains crowd-sourced markup of healthy and tumorous tissues. Abstract: Background and objective: Lung cancer is the most common type of cancer with a high mortality rate. Early detection using medical imaging is critically important for the long-term survival of the patients. Computer-aided diagnosis (CAD) tools can potentially reduce the number of incorrect interpretations of medical image data by radiologists. Datasets with adequate sample size, annotation, and truth are the dominant factors in developing and training effective CAD algorithms. The objective of this study was to produce a practical approach and a tool for the creation of medical image datasets. Methods: The proposed model uses the modified maximum transverse diameter approach to mark a putative lung nodule. The modification involves the possibility to use a set of overlapping spheres of appropriate size to approximate the shape of the nodule. The algorithm embedded in the model also groups the marks made by different readers for the same lesion. We used the data of 536 randomly selected patients of Moscow outpatient clinics to create a dataset of standard-dose chest computed tomography (CT) scans utilizingHighlights: We have developed a simplified cluster model and an open-source tool for lung cancer CT scans annotation. The model accelerates the nodule markup process and enhances its efficiency. Using the model, we have created the publicly available CTLungCa-500 dataset of lung CT images. The dataset contains crowd-sourced markup of healthy and tumorous tissues. Abstract: Background and objective: Lung cancer is the most common type of cancer with a high mortality rate. Early detection using medical imaging is critically important for the long-term survival of the patients. Computer-aided diagnosis (CAD) tools can potentially reduce the number of incorrect interpretations of medical image data by radiologists. Datasets with adequate sample size, annotation, and truth are the dominant factors in developing and training effective CAD algorithms. The objective of this study was to produce a practical approach and a tool for the creation of medical image datasets. Methods: The proposed model uses the modified maximum transverse diameter approach to mark a putative lung nodule. The modification involves the possibility to use a set of overlapping spheres of appropriate size to approximate the shape of the nodule. The algorithm embedded in the model also groups the marks made by different readers for the same lesion. We used the data of 536 randomly selected patients of Moscow outpatient clinics to create a dataset of standard-dose chest computed tomography (CT) scans utilizing the double-reading approach with arbitration. Six volunteer radiologists independently produced a report for each scan using the proposed model with the main focus on the detection of lesions with sizes ranging from 3 to 30 mm. After this, an arbitrator reviewed their marks and annotations. Results: The maximum transverse diameter approach outperformed the alternative methods (3D box, ellipsoid, and complete outline construction) in a study of 10, 000 computer-generated tumor models of different shapes in terms of accuracy and speed of nodule shape approximation. The markup and annotation of the CTLungCa-500 dataset revealed 72 studies containing no lung nodules. The remaining 464 CT scans contained 3151 lesions marked by at least one radiologist: 56%, 14%, and 29% of the lesions were malignant, benign, and non-nodular, respectively. 2887 lesions have the target size of 3–30 mm. Only 70 nodules were uniformly identified by all the six readers. An increase in the number of independent readers providing CT scans interpretations led to an accuracy increase associated with a decrease in agreement. The dataset markup process took three working weeks. Conclusions: The developed cluster model simplifies the collaborative and crowdsourced creation of image repositories and makes it time-efficient. Our proof-of-concept dataset provides a valuable source of annotated medical imaging data for training CAD algorithms aimed at early detection of lung nodules. The tool and the dataset are publicly available at https://github.com/Center-of-Diagnostics-and-Telemedicine/FAnTom.git and https://mosmed.ai/en/datasets/ct_lungcancer_500/, respectively. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 206(2021)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 206(2021)
- Issue Display:
- Volume 206, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 206
- Issue:
- 2021
- Issue Sort Value:
- 2021-0206-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-07
- Subjects:
- Lung nodule -- Computed tomography (CT) -- Computer-aided diagnosis (CAD) -- Machine learning (ML) -- Training dataset -- Crowdsourcing -- Weak labeling
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2021.106111 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17207.xml