P-271 Challenges with comparing different commercially available Artificial Intelligence (AI) systems on the same data set of time-lapse selected euploid blastocysts. (30th June 2022)
- Record Type:
- Journal Article
- Title:
- P-271 Challenges with comparing different commercially available Artificial Intelligence (AI) systems on the same data set of time-lapse selected euploid blastocysts. (30th June 2022)
- Main Title:
- P-271 Challenges with comparing different commercially available Artificial Intelligence (AI) systems on the same data set of time-lapse selected euploid blastocysts
- Authors:
- Nikitos, E
Triantafillou, T
Dimitropoulos, K.D
Kallergi, V
Psathas, P
Erlich, I
Ben-Meir, A
Bergelson, N - Abstract:
- Abstract: Study question: To identify challenges in choosing a robust AI following comparative validation with data already pre-selected with established embryos selection tools: blastulation, morphology, time-lapse, PGTA. Summary answer: Challenges included: bias; assessment against outcomes AI models were not trained on; performance metrics prioritisation; statistical methodology; continuous data cutoffs for binary clinical decision making. What is known already: AI is commercially available to be incorporated into routine practice to support embryo selection decision-making. Different clinical practices and demographics are used to train AI models, potentially impacting the prediction efficacy of the same model when used in different clinics. Fertility professionals require robust methods of validation to responsibly implement AI-based tools. Unbiased and robust frameworks for comparing AI systems in the same dataset are needed. Validating AI in a dataset of time-lapse selected euploid blastocysts using all the current methods of embryo selection currently available is the toughest assessment possible and has not previously been performed. Study design, size, duration: This study uses a retrospectively timelapse dataset collected from 2018-2021 at a single private fertility clinic. The dataset included 915 blastocysts which underwent PGTA (913 results: 381 euploids, 528 aneuploids, 4 mosaics) and 46 euploids transferred with known bhcg and ongoing clinical outcome (ofAbstract: Study question: To identify challenges in choosing a robust AI following comparative validation with data already pre-selected with established embryos selection tools: blastulation, morphology, time-lapse, PGTA. Summary answer: Challenges included: bias; assessment against outcomes AI models were not trained on; performance metrics prioritisation; statistical methodology; continuous data cutoffs for binary clinical decision making. What is known already: AI is commercially available to be incorporated into routine practice to support embryo selection decision-making. Different clinical practices and demographics are used to train AI models, potentially impacting the prediction efficacy of the same model when used in different clinics. Fertility professionals require robust methods of validation to responsibly implement AI-based tools. Unbiased and robust frameworks for comparing AI systems in the same dataset are needed. Validating AI in a dataset of time-lapse selected euploid blastocysts using all the current methods of embryo selection currently available is the toughest assessment possible and has not previously been performed. Study design, size, duration: This study uses a retrospectively timelapse dataset collected from 2018-2021 at a single private fertility clinic. The dataset included 915 blastocysts which underwent PGTA (913 results: 381 euploids, 528 aneuploids, 4 mosaics) and 46 euploids transferred with known bhcg and ongoing clinical outcome (of which 40 resulted to live birth). Following a prospective, comparative, observational, cohort study design, blastocysts were blindly scored using the CHLOE(FAIRTILITY) and another commercially available AI system, referred to as 'AI-2'. Participants/materials, setting, methods: Patients aged 24-47years (average 35.4). Blastocysts selected for biopsy and transfer based on morphology and KIDScore(Vitrolife). Both AI systems were tested in the data set blindly, without any training. Correlation Regression analysis assessed correlation with KIDSCORE and relative to each AI system. Efficacy of prediction (using metrics AUC, Accuracy, Sensitivity, Specificity and Informedness) of outcomes (ploidy, biochemical and clinical pregnancy) were assessed for both AI models (CHLOEvsAI-2) by two independent statisticians to establish significance. Main results and the role of chance: Regression analysis demonstrated no correlation between KIDSCORE and AI-2(r2=0.3%, p=0.5) or between CHLOE(FAIRTILITY) and AI-2(r2=0.03%, p=0.9). CHLOE(FAIRTILITY) correlated with KIDSCORE(r2=29%, p<0.001). AI-2 was not predictive of ploidy (Euploids vs Aneuploids+mosaic: AUC=0.5, p=0.6). CHLOE(Fairtility) was predictive of ploidy(AUC=0.66, p<0.001). Neither AI-2 or CHLOE(Fairtility) predicted which embryo the human embryologist prioritised for transfer (AI-2 vs CHLOE:accuracy:0.31vs0.49, p<0.00001). Neither AI-2 nor CHLOE(Fairtility) predicted which embryo the human embryologist prioritised for transfer (AI-2 vs CHLOE:accuracy: 0.31vs0.49, p<0.00001). There was no difference detected in efficacy of prediction of biochemical (accuracy:0.52vs0.67, NS) and ongoing clinical pregnancy (accuracy:0.53 vs 0.78, NS) by AI-2 or CHLOE. This is partly due to the low number of euploid transfers assessed (n = 46), and partly due to the fact that neither of these algorithms are trained specifically on predicting outcome of euploid transfers. CHLOE(Fairtility) was more specific than AI-2 for predicting selection for transfer(0.44/0.80vs0.17/0.93, p<0.05/NS) and ploidy(0.54/0.77vs0.23/0.87, p<0.05/NS), and they were equally as sensitive. CHLOE(Fairtility) was more sensitive, and less specific than AI-2 for predicting biochemical pregnancy(0.36/0.81vs0.86/0.38, p<0.05) and more sensitive but equally as specific for predicting clinical pregnancy(0.33/0.88vs0.83/0.46, NS/p<0.05). Informedness was positive for both CHLOE(Fairtility) and AI-2 in predicting all outcomes assessed. Informedness was greater for AI-2 for predicting morphology(AI-2vsCHLOE:0.16vs0.31, p<0.05), transfer(0.11vs0.24, p<0.05), ploidy(0.10vs0.31, p<0.05) and equivalent for predicting biochemical (0.23vs0.17, NS) and clinical pregnancy(0.29vs0.22, NS). Limitations, reasons for caution: In this single clinic study, both algorithms were assessed against outcomes (live birth following transfer of time-lapse cultured euploid blastocysts) for which they were not trained on: AI-2(designed for ploidy prediction) and CHLOE(FAIRTILITY, implantation prediction of non-PGTA embryos) and no clinic data was used for training. Wider implications of the findings: The only way to decide which AI model is more useful is by a direct comparison of two or more models on the same dataset with same outcomes and metrics, as recommended by TRIPOD. To date, this is the first publication comparing multiple commercial AI models on the same dataset. Trial registration number: NA … (more)
- Is Part Of:
- Human reproduction. Volume 37(2022)Supplement 1
- Journal:
- Human reproduction
- Issue:
- Volume 37(2022)Supplement 1
- Issue Display:
- Volume 37, Issue 1 (2022)
- Year:
- 2022
- Volume:
- 37
- Issue:
- 1
- Issue Sort Value:
- 2022-0037-0001-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-06-30
- Subjects:
- Human reproduction -- Periodicals
618 - Journal URLs:
- http://humrep.oxfordjournals.org/ ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/humrep/deac107.260 ↗
- Languages:
- English
- ISSNs:
- 0268-1161
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4336.431000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22965.xml