A novel methodology to classify test cases using natural language processing and imbalanced learning. (October 2020)
- Record Type:
- Journal Article
- Title:
- A novel methodology to classify test cases using natural language processing and imbalanced learning. (October 2020)
- Main Title:
- A novel methodology to classify test cases using natural language processing and imbalanced learning
- Authors:
- Tahvili, Sahar
Hatvani, Leo
Ramentol, Enislay
Pimentel, Rita
Afzal, Wasif
Herrera, Francisco - Abstract:
- Abstract: Detecting the dependency between integration test cases plays a vital role in the area of software test optimization. Classifying test cases into two main classes – dependent and independent – can be employed for several test optimization purposes such as parallel test execution, test automation, test case selection and prioritization, and test suite reduction. This task can be seen as an imbalanced classification problem due to the test cases' distribution. Often the number of dependent and independent test cases is uneven, which is related to the testing level, testing environment and complexity of the system under test. In this study, we propose a novel methodology that consists of two main steps. Firstly, by using natural language processing we analyze the test cases' specifications and turn them into a numeric vector. Secondly, by using the obtained data vectors, we classify each test case into a dependent or an independent class. We carry out a supervised learning approach using different methods for handling imbalanced datasets. The feasibility and possible generalization of the proposed methodology is evaluated in two industrial projects at Bombardier Transportation, Sweden, which indicates promising results. Graphical abstract: Highlights: In a manual testing procedure, all testing artifacts are written in a natural text, employing natural language processing techniques might provide highly useful information for test optimization purposes. The ratio ofAbstract: Detecting the dependency between integration test cases plays a vital role in the area of software test optimization. Classifying test cases into two main classes – dependent and independent – can be employed for several test optimization purposes such as parallel test execution, test automation, test case selection and prioritization, and test suite reduction. This task can be seen as an imbalanced classification problem due to the test cases' distribution. Often the number of dependent and independent test cases is uneven, which is related to the testing level, testing environment and complexity of the system under test. In this study, we propose a novel methodology that consists of two main steps. Firstly, by using natural language processing we analyze the test cases' specifications and turn them into a numeric vector. Secondly, by using the obtained data vectors, we classify each test case into a dependent or an independent class. We carry out a supervised learning approach using different methods for handling imbalanced datasets. The feasibility and possible generalization of the proposed methodology is evaluated in two industrial projects at Bombardier Transportation, Sweden, which indicates promising results. Graphical abstract: Highlights: In a manual testing procedure, all testing artifacts are written in a natural text, employing natural language processing techniques might provide highly useful information for test optimization purposes. The ratio of dependent and independent test cases might suffer from an imbalanced distribution due to the testing level and complexity of the system under test. Doc2Vec proves to be a good tool when transforming the manual test cases into feature vectors. IFROWANN performs well when splitting dependent and independent test cases as an imbalance learning algorithm. … (more)
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 95(2020)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 95(2020)
- Issue Display:
- Volume 95, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 95
- Issue:
- 2020
- Issue Sort Value:
- 2020-0095-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-10
- Subjects:
- Software testing -- Artificial intelligence -- Imbalanced classification -- Natural language processing -- Optimization -- IFROWANN -- Doc2Vec
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2020.103878 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14012.xml