Code smell detection and identification in imbalanced environments. (15th March 2021)
- Record Type:
- Journal Article
- Title:
- Code smell detection and identification in imbalanced environments. (15th March 2021)
- Main Title:
- Code smell detection and identification in imbalanced environments
- Authors:
- Boutaib, Sofien
Bechikh, Slim
Palomba, Fabio
Elarbi, Maha
Makhlouf, Mohamed
Said, Lamjed Ben - Abstract:
- Abstract: Context: Code smells are sub-optimal design choices that could lower software maintainability. Objective: Previous literature did not consider an important characteristic of the smell detection problem, namely data imbalance. When considering a high number of code smell types, the number of smelly classes is likely to largely exceed the number of non-smelly ones, and vice versa. Moreover, most studies did address the smell identification problem, which is more likely to present a higher imbalance as the number of smelly classes is relatively much less than the number of non-smelly ones. Furthermore, an additional research gap in the literature consists in the fact that the number of smell type identification methods is very small compared to the detection ones. Research gap: The main challenges in smell detection and identification in an imbalanced environment are: (1) the structuring of the smell detector that should be able to deal with complex splitting boundaries and small disjuncts, (2) the design of the detector quality evaluation function that should take into account data imbalance, and (3) the efficient search for effective software metrics' thresholds that should well characterize the different smells. Furthermore, the number of smell type identification methods is very small compared to the detection ones. Method: We propose ADIODE, an effective search-based engine that is able to deal with all the above-described challenges not only for the smellAbstract: Context: Code smells are sub-optimal design choices that could lower software maintainability. Objective: Previous literature did not consider an important characteristic of the smell detection problem, namely data imbalance. When considering a high number of code smell types, the number of smelly classes is likely to largely exceed the number of non-smelly ones, and vice versa. Moreover, most studies did address the smell identification problem, which is more likely to present a higher imbalance as the number of smelly classes is relatively much less than the number of non-smelly ones. Furthermore, an additional research gap in the literature consists in the fact that the number of smell type identification methods is very small compared to the detection ones. Research gap: The main challenges in smell detection and identification in an imbalanced environment are: (1) the structuring of the smell detector that should be able to deal with complex splitting boundaries and small disjuncts, (2) the design of the detector quality evaluation function that should take into account data imbalance, and (3) the efficient search for effective software metrics' thresholds that should well characterize the different smells. Furthermore, the number of smell type identification methods is very small compared to the detection ones. Method: We propose ADIODE, an effective search-based engine that is able to deal with all the above-described challenges not only for the smell detection case but also for the identification one. Indeed, ADIODE is an EA (Evolutionary Algorithm) that evolves a population of detectors encoded as ODTs (Oblique Decision Trees) using the F -measure as a fitness function. This allows ADIODE to efficiently approximate globally-optimal detectors with effective oblique splitting hyper-planes and metrics' thresholds. We note that to build the BE, each software class is parsed using a particular tool with the aim to extract its metrics' values, based on which the considered class is labeled by means of a set of existing advisors; which could be seen as a two-step construction process. Results: A comparative experimental study on six open-source software systems demonstrates the merits and the outperformance of our approach compared to four of the most representative and prominent baseline techniques available in literature. The detection results show that the F -measure of ADIODE ranges between 91.23 % and 95.24 %, and its AUC lies between 0.9273 and 0.9573. Similarly, the identification results indicate that the F -measure of ADIODE varies between 86.26 % and 94.5 %, and its AUC is between 0.8653 and 0.9531. Highlights: Code smells detection could be an imbalanced data classification problem. The existing works have encountered problems in dealing with data imbalance. A novel approach called ADIODE is proposed to detect and/or identify code smells. An experimental study is performed using the F-measure and the AUC metrics. … (more)
- Is Part Of:
- Expert systems with applications. Volume 166(2021)
- Journal:
- Expert systems with applications
- Issue:
- Volume 166(2021)
- Issue Display:
- Volume 166, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 166
- Issue:
- 2021
- Issue Sort Value:
- 2021-0166-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-03-15
- Subjects:
- Code smells detection -- Smell type identification -- Imbalanced data classification -- Oblique decision tree -- Evolutionary algorithm
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2020.114076 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15183.xml