Deriving and evaluating a fault model for testing data science applications. Issue 5 (16th March 2022)
- Record Type:
- Journal Article
- Title:
- Deriving and evaluating a fault model for testing data science applications. Issue 5 (16th March 2022)
- Main Title:
- Deriving and evaluating a fault model for testing data science applications
- Authors:
- Aftab Jilani, Atif
Sherin, Salman
Ijaz, Sidra
Zohaib Iqbal, Muhammad
Uzair Khan, Muhammad - Abstract:
- Abstract: Data science (DS) applications not only suffer from traditional software faults but may also suffer from data‐specific and model‐related faults. Fault models play an important role in evaluating and designing tests for testing DS applications. The existing fault models do not consider DS specific faults. In this study, we built a fault model DS applications. We investigate the faults by using diverse approaches: (i) a multi‐vocal literature survey of published literature, (ii) semi‐structured interviews of industry experts. The Multi‐vocal study allows us to synthesize the existing knowledge from researchers and practitioners. Qualitative data from semi‐structured interviews provide us with insights into the nature of faults encountered by practitioners. We combine the results of (i) and (ii) to derive a detailed fault model. The developed fault model is further validated through a quantitative survey of industry practitioners, and the respondents were asked to identify the faults from our proposed fault model that they have experienced and classify those faults based on their severity as perceived by practitioners and its frequency. The results show that practitioners consider prediction bias and model decay as the most severe faults while data sampling and splitting faults along with feature engineering faults are the most frequent. Abstract : This study proposes a fault model for testing data science (DS) applications, which was derived from a multi‐vocalAbstract: Data science (DS) applications not only suffer from traditional software faults but may also suffer from data‐specific and model‐related faults. Fault models play an important role in evaluating and designing tests for testing DS applications. The existing fault models do not consider DS specific faults. In this study, we built a fault model DS applications. We investigate the faults by using diverse approaches: (i) a multi‐vocal literature survey of published literature, (ii) semi‐structured interviews of industry experts. The Multi‐vocal study allows us to synthesize the existing knowledge from researchers and practitioners. Qualitative data from semi‐structured interviews provide us with insights into the nature of faults encountered by practitioners. We combine the results of (i) and (ii) to derive a detailed fault model. The developed fault model is further validated through a quantitative survey of industry practitioners, and the respondents were asked to identify the faults from our proposed fault model that they have experienced and classify those faults based on their severity as perceived by practitioners and its frequency. The results show that practitioners consider prediction bias and model decay as the most severe faults while data sampling and splitting faults along with feature engineering faults are the most frequent. Abstract : This study proposes a fault model for testing data science (DS) applications, which was derived from a multi‐vocal literature survey and semi‐structured interviews of industry experts. Industry practitioners further validate the fault model. They identified the faults from the fault model they experienced and classified them based on their severity and frequency. The results show that prediction bias and model decay are the most severe faults, while data sampling and splitting faults and feature engineering faults are the most frequent. … (more)
- Is Part Of:
- Journal of software. Volume 34:Issue 5(2022)
- Journal:
- Journal of software
- Issue:
- Volume 34:Issue 5(2022)
- Issue Display:
- Volume 34, Issue 5 (2022)
- Year:
- 2022
- Volume:
- 34
- Issue:
- 5
- Issue Sort Value:
- 2022-0034-0005-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2022-03-16
- Subjects:
- data science -- fault model -- software testing -- testing machine learning applications
Software engineering -- Periodicals
Computer software -- Development -- Periodicals
Software maintenance -- Periodicals
005.1 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2047-7481 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/smr.2449 ↗
- Languages:
- English
- ISSNs:
- 2047-7473
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21324.xml