Alphabet usage pattern, word lengths, and sparsity in seven Indo-European languages. (26th October 2019)
- Record Type:
- Journal Article
- Title:
- Alphabet usage pattern, word lengths, and sparsity in seven Indo-European languages. (26th October 2019)
- Main Title:
- Alphabet usage pattern, word lengths, and sparsity in seven Indo-European languages
- Authors:
- Rajput, Nikhil Kumar
Ahuja, Bhavya
Riyal, Manoj Kumar - Abstract:
- Abstract: An empirical study on about 1.7 million dictionary words from seven languages viz. English, French, Dutch, Spanish, Italian, Hindi, and German has been conducted. Three intriguing characteristic features have been analyzed. First, the alphabet usage pattern in a language was determined which can be used to give an idea on how alphabets have been employed. For instance, the alphabet 'e' is highly used in English, while 'q' is least used. Second, the average and range of word lengths in the languages were computed and seen to vary from 1 to 37. Average word lengths were computed in the range (6.665–11.14). For comparison, word lengths have been fitted using Gaussian distribution. Third, a new measure was derived; which we termed 'Language Sparsity'; computed as one minus ratio of number of words of a particular length already existing to the total number of possible words that can be formed. Sparsity hence gives a measure of the scope of fruition in languages. Two such measures have been defined: a weighted and a nonweighted sparsity. Nonweighted sparsity was found to be minimum (0.877) for English and maximum (0.982) for Dutch. The results obtained can play a significant role in propagating the synergy of language evolution.
- Is Part Of:
- Digital scholarship in the humanties. Volume 35:Number 4(2020)
- Journal:
- Digital scholarship in the humanties
- Issue:
- Volume 35:Number 4(2020)
- Issue Display:
- Volume 35, Issue 4 (2020)
- Year:
- 2020
- Volume:
- 35
- Issue:
- 4
- Issue Sort Value:
- 2020-0035-0004-0000
- Page Start:
- 727
- Page End:
- 736
- Publication Date:
- 2019-10-26
- Subjects:
- Philology -- Data processing -- Periodicals
Computational linguistics -- Periodicals
410.285 - Journal URLs:
- http://www.oxfordjournals.org/ ↗
http://dsh.oxfordjournals.org/ ↗ - DOI:
- 10.1093/llc/fqz076 ↗
- Languages:
- English
- ISSNs:
- 2055-768X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15130.xml