Multiple regression techniques for modelling dates of first performances of Shakespeare-era plays. (15th August 2022)
- Record Type:
- Journal Article
- Title:
- Multiple regression techniques for modelling dates of first performances of Shakespeare-era plays. (15th August 2022)
- Main Title:
- Multiple regression techniques for modelling dates of first performances of Shakespeare-era plays
- Authors:
- Moscato, Pablo
Craig, Hugh
Egan, Gabriel
Haque, Mohammad Nazmul
Huang, Kevin
Sloan, Julia
Corrales de Oliveira, Jonathon - Abstract:
- Abstract: The creation of new computational methods to provide fresh insights on literary styles is a hot topic of research. There are particular challenges when the number of samples is small in comparison with the number of variables. One problem of interest to literary historians is the date of the first performance of a play of Shakespeare's time. Currently this must usually be guessed with reference to multiple indirect external sources, or to some aspect of the content or style of the play. This paper highlights a dating technique with a wider potential, using this particular problem as a case study. In this contribution, we introduce a novel dataset of Shakespeare-era plays (181 plays from the period 1585–1610), annotated by the best-guess dates for them from a standard reference work as metadata. We introduce a memetic algorithm-based Continued Fraction Regression (CFR) which delivered models using a small number of variables, leading to an interpretable model and reduced dimensionality, applied for the first time here in a problem of computational stylistics. Our independent variables are the probabilities of occurrences of individual words in each one of the plays. We studied the performance of 11 widely used regression methods to predict the dates of the plays at an 80/20 training/test split. An in-depth analysis of the most commonly occurring 20 words in the CFR models in 100 independent runs helps explain the trends in linguistic and stylistic terms. The use ofAbstract: The creation of new computational methods to provide fresh insights on literary styles is a hot topic of research. There are particular challenges when the number of samples is small in comparison with the number of variables. One problem of interest to literary historians is the date of the first performance of a play of Shakespeare's time. Currently this must usually be guessed with reference to multiple indirect external sources, or to some aspect of the content or style of the play. This paper highlights a dating technique with a wider potential, using this particular problem as a case study. In this contribution, we introduce a novel dataset of Shakespeare-era plays (181 plays from the period 1585–1610), annotated by the best-guess dates for them from a standard reference work as metadata. We introduce a memetic algorithm-based Continued Fraction Regression (CFR) which delivered models using a small number of variables, leading to an interpretable model and reduced dimensionality, applied for the first time here in a problem of computational stylistics. Our independent variables are the probabilities of occurrences of individual words in each one of the plays. We studied the performance of 11 widely used regression methods to predict the dates of the plays at an 80/20 training/test split. An in-depth analysis of the most commonly occurring 20 words in the CFR models in 100 independent runs helps explain the trends in linguistic and stylistic terms. The use of the CFR has helped us to reveal an interesting mathematical model that links the variation in the use of the words through time, which helps to provide estimates of the dates of plays of the Shakespeare-era. We check for genre effects as a possible confounding variable. Highlights: Prediction of dates by iterative Continued Fraction Regression was state of the art. Iter-CFR used < 20 words per model so was readily interpretable. The most accurate model was at depth = 0: change is evidently monotonic. Some of the words in the models were unexpected as markers of change over time. Iter-CFR performed at state of the art in predicting dates for Out-of-Domain plays. … (more)
- Is Part Of:
- Expert systems with applications. Volume 200(2022)
- Journal:
- Expert systems with applications
- Issue:
- Volume 200(2022)
- Issue Display:
- Volume 200, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 200
- Issue:
- 2022
- Issue Sort Value:
- 2022-0200-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-08-15
- Subjects:
- Shakespeare-era plays -- Continued fraction regression -- Dating of plays -- Play's genre -- Memetic algorithm
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2022.116903 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21406.xml