The predictive capabilities of mathematical models for the type-token relationship in English language corpora. (November 2021)
- Record Type:
- Journal Article
- Title:
- The predictive capabilities of mathematical models for the type-token relationship in English language corpora. (November 2021)
- Main Title:
- The predictive capabilities of mathematical models for the type-token relationship in English language corpora
- Authors:
- Tunnicliffe, Martin
Hunter, Gordon - Abstract:
- Highlights: Heaps' law optimizes closer than the Good-Toulmin model to vocabulary growth data. Trained Heaps over-predicts future vocabulary growth, while Good-Toulmin under-predicts. Bernoulli model with Zipf-mandelbrot selection predicts better than Zipf alone. Average of three models provides plausible unbiased prediction. Zipf-Mandelbrot parameters agree better than Zipf with independently measured values. Abstract: We investigate the predictive capability of mathematical models of the type-token relationship applied to the vocabulary growth profiles of selected English language documents. We compare the existing Good-Toulmin and Heaps formulae with an alternative approach based on Bernoulli trial word selection from a fixed finite vocabulary using the Zipf and Zipf-Mandelbrot probability distributions. We make two major observations: firstly, while the Zipf-Mandelbrot model makes better predictions of vocabulary growth than the Zipf model, the optimized parameters of the latter correlate better than those of the former with statistics gleaned independently from the data. Secondly, the mean of the Zipf-Mandelbrot, Good-Toulmin and Heaps models provides a more consistent and unbiased prediction of vocabulary than any individual model alone.
- Is Part Of:
- Computer speech & language. Volume 70(2021)
- Journal:
- Computer speech & language
- Issue:
- Volume 70(2021)
- Issue Display:
- Volume 70, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 70
- Issue:
- 2021
- Issue Sort Value:
- 2021-0070-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-11
- Subjects:
- Types/token systems -- Vocabulary size -- Zipf's law -- Heaps' law
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2021.101227 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17252.xml