Modeling the scholars: Detecting intertextuality through enhanced word-level n-gram matching. (15th May 2014)
- Record Type:
- Journal Article
- Title:
- Modeling the scholars: Detecting intertextuality through enhanced word-level n-gram matching. (15th May 2014)
- Main Title:
- Modeling the scholars: Detecting intertextuality through enhanced word-level n-gram matching
- Authors:
- Forstall, Christopher
Coffee, Neil
Buck, Thomas
Roache, Katherine
Jacobson, Sarah - Abstract:
- Abstract: The study of intertextuality, or how authors make artistic use of other texts in their works, has a long tradition, and has in recent years benefited from a variety of applications of digital methods. This article describes an approach for detecting the sorts of intertexts that literary scholars have found most meaningful, as embodied in the free Tesserae website http://tesserae.caset.buffalo.edu/ . Tests of Tesserae Versions 1 and 2 showed that word-level n-gram matching could recall a majority of parallels identified by scholarly commentators in a benchmark set. But these versions lacked precision, so that the meaningful parallels could be found only among long lists of those that were not meaningful. The Version 3 search described here adds a second stage scoring system that sorts the found parallels by a formula accounting for word frequency and phrase density. Testing against a benchmark set of intertexts in Latin epic poetry shows that the scoring system overall succeeds in ranking parallels of greater significance more highly, allowing site users to find meaningful parallels more quickly. Users can also choose to adjust both recall and precision by focusing only on results above given score levels. As a theoretical matter, these tests establish that lemma identity, word frequency, and phrase density are important constituents of what make a phrase parallel a meaningful intertext.
- Is Part Of:
- Digital scholarship in the humanties. Volume 30:Number 4(2015)
- Journal:
- Digital scholarship in the humanties
- Issue:
- Volume 30:Number 4(2015)
- Issue Display:
- Volume 30, Issue 4 (2015)
- Year:
- 2015
- Volume:
- 30
- Issue:
- 4
- Issue Sort Value:
- 2015-0030-0004-0000
- Page Start:
- 503
- Page End:
- 515
- Publication Date:
- 2014-05-15
- Subjects:
- Philology -- Data processing -- Periodicals
Computational linguistics -- Periodicals
410.285 - Journal URLs:
- http://www.oxfordjournals.org/ ↗
http://dsh.oxfordjournals.org/ ↗ - DOI:
- 10.1093/llc/fqu014 ↗
- Languages:
- English
- ISSNs:
- 2055-768X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 26681.xml