A method to measure data complexity of a complicated medical data set. Issue 6 (27th May 2022)
- Record Type:
- Journal Article
- Title:
- A method to measure data complexity of a complicated medical data set. Issue 6 (27th May 2022)
- Main Title:
- A method to measure data complexity of a complicated medical data set
- Authors:
- Juhola, Martti
Joutsijoki, Henry
Penttinen, Kirsi
Shah, Disheet
Aalto‐Setälä, Katriina - Abstract:
- Abstract: In this article, we consider data complexity in the context of calcium transient signal data collected from induced pluripotent stem cell‐derived cardiomyocytes. We present a novel way to measure data complexity based on the nearest neighbour searching method. Data complexity here is seen as overlapping and mixed data classes in addition to a relatively great number of data cases. Complexity affects classification results, which were run with nearest neighbour searching, feedforward artificial neural networks and random forests for seven genetic cardiological disease classes and healthy controls. The data are obtained from individuals carrying mutations for genetic cardiac diseases with induced pluripotent stem cell (iPSC) technology and the diseases include hypertrophic cardiomyopathy with two different founder gene mutations, dilated cardiomyopathy, long QT syndrome type 1 and 2, Brugada syndrome, a severe genetic ventricular arrhythmia (CPVT) and healthy controls. The data are from calcium transients from spontaneously beating iPSC‐derived cardiomyocytes cultured in a biotechnology laboratory. When the genotype of the iPSC‐derived cardiomyocytes is the same as the donor of the tissue sample and based on the characteristics of the calcium transients, it was possible to classify the seven diseases and healthy controls with machine learning. Peak data first detected before actual pre‐processing from calcium transient signals corresponded to beats (repeatingAbstract: In this article, we consider data complexity in the context of calcium transient signal data collected from induced pluripotent stem cell‐derived cardiomyocytes. We present a novel way to measure data complexity based on the nearest neighbour searching method. Data complexity here is seen as overlapping and mixed data classes in addition to a relatively great number of data cases. Complexity affects classification results, which were run with nearest neighbour searching, feedforward artificial neural networks and random forests for seven genetic cardiological disease classes and healthy controls. The data are obtained from individuals carrying mutations for genetic cardiac diseases with induced pluripotent stem cell (iPSC) technology and the diseases include hypertrophic cardiomyopathy with two different founder gene mutations, dilated cardiomyopathy, long QT syndrome type 1 and 2, Brugada syndrome, a severe genetic ventricular arrhythmia (CPVT) and healthy controls. The data are from calcium transients from spontaneously beating iPSC‐derived cardiomyocytes cultured in a biotechnology laboratory. When the genotype of the iPSC‐derived cardiomyocytes is the same as the donor of the tissue sample and based on the characteristics of the calcium transients, it was possible to classify the seven diseases and healthy controls with machine learning. Peak data first detected before actual pre‐processing from calcium transient signals corresponded to beats (repeating excitation–contraction coupling) of induced stem cell‐derived cardiomyocytes and formed the basis of classification. During pre‐processing of the calcium transient signals, we found that such techniques among others as even strong outlier cleaning or class size balancing by generating artificial cases improved only slightly or not at all classification accuracies. Therefore, the current data set was sufficiently complicated for our data complexity study. Random forests produced the best classification accuracies, 68% for all eight classes. … (more)
- Is Part Of:
- International journal of imaging systems and technology. Volume 32:Issue 6(2022)
- Journal:
- International journal of imaging systems and technology
- Issue:
- Volume 32:Issue 6(2022)
- Issue Display:
- Volume 32, Issue 6 (2022)
- Year:
- 2022
- Volume:
- 32
- Issue:
- 6
- Issue Sort Value:
- 2022-0032-0006-0000
- Page Start:
- 1822
- Page End:
- 1831
- Publication Date:
- 2022-05-27
- Subjects:
- calcium transient signals -- classification -- data complexity -- genetic cardiac cardiomyocytes -- induced pluripotent stem cell‐derived cardiomyocytes -- machine learning
Imaging systems -- Periodicals
Image processing -- Periodicals
621.367 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1098-1098 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/ima.22760 ↗
- Languages:
- English
- ISSNs:
- 0899-9457
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.299000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24731.xml