TIDIER: an identifier splitting approach using speech recognition techniques. Issue 6 (30th June 2011)
- Record Type:
- Journal Article
- Title:
- TIDIER: an identifier splitting approach using speech recognition techniques. Issue 6 (30th June 2011)
- Main Title:
- TIDIER: an identifier splitting approach using speech recognition techniques
- Authors:
- Guerrouj, Latifa
Di Penta, Massimiliano
Antoniol, Giuliano
Guéhéneuc, Yann‐Gaël
Capilla, Rafael
DueÑas, Juan C.
Ferenc, Rudolf - Abstract:
- <abstract abstract-type="main" xml:lang="en" id="smr539-abs-0001"> <title>SUMMARY</title> <p id="smr539-para-0001">The software engineering literature reports empirical evidence on the relation between various characteristics of a software system and its quality. Among other factors, recent studies have shown that a proper choice of identifiers influences understandability and maintainability. Indeed, identifiers are developers' main source of information and guide their cognitive processes during program comprehension when high‐level documentation is scarce or outdated and when source code is not sufficiently commented. This paper proposes a novel approach to recognize words composing source code identifiers. The approach is based on an adaptation of Dynamic Time Warping used to recognize words in continuous speech. The approach overcomes the limitations of existing identifier‐splitting approaches when naming conventions (e.g., Camel Case) are not used or when identifiers contain abbreviations. We apply the approach on a sample of more than 1000 identifiers extracted from 340 C programs and compare its results with a simple Camel Case splitter and with an implementation of an alternative identifier splitting approach, Samurai. Results indicate the capability of the novel approach: (i) to outperform the alternative ones, when using a dictionary augmented with domain knowledge or a contextual dictionary and (ii) to expand 48% of a set of selected abbreviations into dictionary<abstract abstract-type="main" xml:lang="en" id="smr539-abs-0001"> <title>SUMMARY</title> <p id="smr539-para-0001">The software engineering literature reports empirical evidence on the relation between various characteristics of a software system and its quality. Among other factors, recent studies have shown that a proper choice of identifiers influences understandability and maintainability. Indeed, identifiers are developers' main source of information and guide their cognitive processes during program comprehension when high‐level documentation is scarce or outdated and when source code is not sufficiently commented. This paper proposes a novel approach to recognize words composing source code identifiers. The approach is based on an adaptation of Dynamic Time Warping used to recognize words in continuous speech. The approach overcomes the limitations of existing identifier‐splitting approaches when naming conventions (e.g., Camel Case) are not used or when identifiers contain abbreviations. We apply the approach on a sample of more than 1000 identifiers extracted from 340 C programs and compare its results with a simple Camel Case splitter and with an implementation of an alternative identifier splitting approach, Samurai. Results indicate the capability of the novel approach: (i) to outperform the alternative ones, when using a dictionary augmented with domain knowledge or a contextual dictionary and (ii) to expand 48% of a set of selected abbreviations into dictionary words. Copyright © 2011 John Wiley &amp; Sons, Ltd.</p> </abstract> … (more)
- Is Part Of:
- Journal of software. Volume 25:Issue 6(2013)
- Journal:
- Journal of software
- Issue:
- Volume 25:Issue 6(2013)
- Issue Display:
- Volume 25, Issue 6 (2013)
- Year:
- 2013
- Volume:
- 25
- Issue:
- 6
- Issue Sort Value:
- 2013-0025-0006-0000
- Page Start:
- 575
- Page End:
- 599
- Publication Date:
- 2011-06-30
- Subjects:
- Software engineering -- Periodicals
Computer software -- Development -- Periodicals
Software maintenance -- Periodicals
005.1 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2047-7481 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/smr.539 ↗
- Languages:
- English
- ISSNs:
- 2047-7473
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 3431.xml