Zipfian regularities in "non-point" word representations. Issue 3 (May 2021)
- Record Type:
- Journal Article
- Title:
- Zipfian regularities in "non-point" word representations. Issue 3 (May 2021)
- Main Title:
- Zipfian regularities in "non-point" word representations
- Authors:
- Şahinuç, Furkan
Koç, Aykut - Abstract:
- Abstract: Being one of the most common empirical regularities, the Zipf's law for word frequencies is a power law relation between word frequencies and frequency ranks of words. We quantitatively study semantic uncertainty of words through non-point distribution-based word embeddings and reveal the Zipfian regularities. Uncertainty of a word can increase due to polysemy, the word having "broad" meaning (such as the relation between broader emotion and narrower exasperation ) or a combination of both. Variances of Gaussian embeddings are utilized to quantify the extent a word can be used in different senses or contexts. By using the variance information embedded in the non-point Gaussian embeddings, we quantitatively show that semantic breadth of words also exhibits Zipfian patterns, when polysemy is controlled. This outcome is complementary to Zipf's law of meaning distribution and the related meaning-frequency law by indicating the existence of Zipfian patterns: more frequent words tend to be generic while less frequent ones tend to be specific. Results for two languages, English and Turkish that belong to different language families, are also provided. Such regularities provide valuable information to extract and understand relationships between semantic properties of words and word frequencies. In various applications, performance improvements can be obtained by employing these regularities. We also propose a method that leverages the Zipfian regularity to improve theAbstract: Being one of the most common empirical regularities, the Zipf's law for word frequencies is a power law relation between word frequencies and frequency ranks of words. We quantitatively study semantic uncertainty of words through non-point distribution-based word embeddings and reveal the Zipfian regularities. Uncertainty of a word can increase due to polysemy, the word having "broad" meaning (such as the relation between broader emotion and narrower exasperation ) or a combination of both. Variances of Gaussian embeddings are utilized to quantify the extent a word can be used in different senses or contexts. By using the variance information embedded in the non-point Gaussian embeddings, we quantitatively show that semantic breadth of words also exhibits Zipfian patterns, when polysemy is controlled. This outcome is complementary to Zipf's law of meaning distribution and the related meaning-frequency law by indicating the existence of Zipfian patterns: more frequent words tend to be generic while less frequent ones tend to be specific. Results for two languages, English and Turkish that belong to different language families, are also provided. Such regularities provide valuable information to extract and understand relationships between semantic properties of words and word frequencies. In various applications, performance improvements can be obtained by employing these regularities. We also propose a method that leverages the Zipfian regularity to improve the performance of baseline textual entailment detection algorithms. To the best of our knowledge, our approach is the first quantitative study that uses Gaussian embeddings to examine the relationships between word frequencies and semantic breadth. Highlights: Variances of Gaussian embeddings can be used to quantify semantic uncertainty. There exist Zipfian regularities between word frequencies and semantic breadth/uncertainty. Zipfian patterns: more frequent words tends to be generic while less frequent ones tend to be specific. Zipfian patterns can be leveraged to increase entailment detection performance. … (more)
- Is Part Of:
- Information processing & management. Volume 58:Issue 3(2021)
- Journal:
- Information processing & management
- Issue:
- Volume 58:Issue 3(2021)
- Issue Display:
- Volume 58, Issue 3 (2021)
- Year:
- 2021
- Volume:
- 58
- Issue:
- 3
- Issue Sort Value:
- 2021-0058-0003-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-05
- Subjects:
- Word variances -- Word frequencies -- Zipf's law -- Meaning-frequency relation -- Zipfian regularities -- Word entailment -- Semantic breadth
Information storage and retrieval systems -- Periodicals
Information science -- Periodicals
Systèmes d'information -- Périodiques
Sciences de l'information -- Périodiques
Information science
Information storage and retrieval systems
Periodicals
658.4038 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064573 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ipm.2021.102493 ↗
- Languages:
- English
- ISSNs:
- 0306-4573
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4493.893000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22877.xml