The possibility of improving automated calculation of measures of lexical richness for EFL writing: A comparison of the LCA, NLTK and SpaCy tools. (June 2022)
- Record Type:
- Journal Article
- Title:
- The possibility of improving automated calculation of measures of lexical richness for EFL writing: A comparison of the LCA, NLTK and SpaCy tools. (June 2022)
- Main Title:
- The possibility of improving automated calculation of measures of lexical richness for EFL writing: A comparison of the LCA, NLTK and SpaCy tools
- Authors:
- Spring, Ryan
Johnson, Matthew - Abstract:
- Abstract: Automatically calculating measures of lexical richness is important for L2 learning because they can be used for assessment of productive abilities and general linguistic ability. One popular tool for doing so is the Lexical Complexity Analyzer (LCA), but more advanced tools for parsing have become available since its creation. This paper compares a modified version of the LCA code run with NLTK and SpaCy, two popular natural language processing toolkits, and the online version of the LCA to calculate 26 measures of lexical richness. We show how similarly they calculate the measures and how well each of the three tools' calculations correlate with EFL writer's human-rated scores and TOEFL® ITP scores. We found that six of the measures suggested to be associated with higher oral proficiency by Lu (2012) were also highly correlated with higher human-rated scores and TOEFL® ITP scores in our data set. However, the modifications to our code that utilize a different list to determine word sophistication and allow be and have verbs to be treated as lexical verbs caused four measures which Lu (2012) found to be unassociated with proficiency to be correlated with both human-rated scores and TOEFL® ITP scores, particularly when run with SpaCy. Highlights: SpaCy and NLTK based tools were compared with the LCA. The three tools performed similarly, but SpaCy provided measures most correlated to human-rating and TOEFL® ITP scores. Code created for SpaCy and NLTK based toolsAbstract: Automatically calculating measures of lexical richness is important for L2 learning because they can be used for assessment of productive abilities and general linguistic ability. One popular tool for doing so is the Lexical Complexity Analyzer (LCA), but more advanced tools for parsing have become available since its creation. This paper compares a modified version of the LCA code run with NLTK and SpaCy, two popular natural language processing toolkits, and the online version of the LCA to calculate 26 measures of lexical richness. We show how similarly they calculate the measures and how well each of the three tools' calculations correlate with EFL writer's human-rated scores and TOEFL® ITP scores. We found that six of the measures suggested to be associated with higher oral proficiency by Lu (2012) were also highly correlated with higher human-rated scores and TOEFL® ITP scores in our data set. However, the modifications to our code that utilize a different list to determine word sophistication and allow be and have verbs to be treated as lexical verbs caused four measures which Lu (2012) found to be unassociated with proficiency to be correlated with both human-rated scores and TOEFL® ITP scores, particularly when run with SpaCy. Highlights: SpaCy and NLTK based tools were compared with the LCA. The three tools performed similarly, but SpaCy provided measures most correlated to human-rating and TOEFL® ITP scores. Code created for SpaCy and NLTK based tools modified definitions of word sophistication and lexical verbs. Measures that showed high correlation to human-rating and TOEFL(R) ITP scores include: NDW, NDW-ER50, CVS1, CTTR, SVV1, and LS1. … (more)
- Is Part Of:
- System. Volume 106(2022)
- Journal:
- System
- Issue:
- Volume 106(2022)
- Issue Display:
- Volume 106, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 106
- Issue:
- 2022
- Issue Sort Value:
- 2022-0106-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-06
- Subjects:
- Lexical richness -- EFL writing -- Automated assessment -- Computer assisted evaluation
Language and languages -- Study and teaching -- Periodicals
Langage et langues -- Étude et enseignement -- Périodiques
Electronic journals
407 - Journal URLs:
- http://www.sciencedirect.com/science/journal/0346251X ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.system.2022.102770 ↗
- Languages:
- English
- ISSNs:
- 0346-251X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8589.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21410.xml