The impact of vocabulary normalization. Issue 4 (April 2015)
- Record Type:
- Journal Article
- Title:
- The impact of vocabulary normalization. Issue 4 (April 2015)
- Main Title:
- The impact of vocabulary normalization
- Authors:
- Binkley, Dave
Lawrie, Dawn - Abstract:
- <abstract abstract-type="main"> <title>Abstract</title> <p>Software development, evolution, and maintenance depend on ever increasing tool support. Recent tools have incorporated increasing analysis of the natural language found in source code, predominately in the identifiers and comments. However, when coders combine abbreviations and acronyms to form multi‐word identifiers, they, in essence, invent new vocabulary making the source code's vocabulary differ from that of other software artifacts. This vocabulary mismatch is a potential problem for many techniques imported from information retrieval and natural language processing, which implicitly assume the use of a single common vocabulary. Vocabulary normalization aims to bring the vocabulary of the source in line with that of other artifacts.</p> <p>A prior small‐scale experiment demonstrated the value of vocabulary normalization for <styled-content style="serif">C</styled-content> code. A more comprehensive experiment using <styled-content style="serif">Java</styled-content> code is presented where normalization fails to bring benefit. To investigate the potential underlying causes, over 20, 000 non‐dictionary words extracted from the program <styled-content style="serif">JabRef</styled-content> were normalized by hand (often requiring significant external information). The experiment, repeated using the hand‐normalized identifiers, again found that normalization brought no improvement. In response to this unexpected<abstract abstract-type="main"> <title>Abstract</title> <p>Software development, evolution, and maintenance depend on ever increasing tool support. Recent tools have incorporated increasing analysis of the natural language found in source code, predominately in the identifiers and comments. However, when coders combine abbreviations and acronyms to form multi‐word identifiers, they, in essence, invent new vocabulary making the source code's vocabulary differ from that of other software artifacts. This vocabulary mismatch is a potential problem for many techniques imported from information retrieval and natural language processing, which implicitly assume the use of a single common vocabulary. Vocabulary normalization aims to bring the vocabulary of the source in line with that of other artifacts.</p> <p>A prior small‐scale experiment demonstrated the value of vocabulary normalization for <styled-content style="serif">C</styled-content> code. A more comprehensive experiment using <styled-content style="serif">Java</styled-content> code is presented where normalization fails to bring benefit. To investigate the potential underlying causes, over 20, 000 non‐dictionary words extracted from the program <styled-content style="serif">JabRef</styled-content> were normalized by hand (often requiring significant external information). The experiment, repeated using the hand‐normalized identifiers, again found that normalization brought no improvement. In response to this unexpected result, the vocabulary differences between <styled-content style="serif">Java</styled-content> and <styled-content style="serif">C</styled-content> codes are considered and used to help frame directions for future work. Copyright © 2015 John Wiley &amp; Sons, Ltd.</p> </abstract> … (more)
- Is Part Of:
- Journal of software. Volume 27:Issue 4(2015:Apr.)
- Journal:
- Journal of software
- Issue:
- Volume 27:Issue 4(2015:Apr.)
- Issue Display:
- Volume 27, Issue 4 (2015)
- Year:
- 2015
- Volume:
- 27
- Issue:
- 4
- Issue Sort Value:
- 2015-0027-0004-0000
- Page Start:
- 255
- Page End:
- 273
- Publication Date:
- 2015-04
- Subjects:
- Software engineering -- Periodicals
Computer software -- Development -- Periodicals
Software maintenance -- Periodicals
005.1 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2047-7481 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/smr.1710 ↗
- Languages:
- English
- ISSNs:
- 2047-7473
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 3460.xml