Evaluating Human Scoring Using Generalizability Theory. Issue 3 (2nd July 2020)
- Record Type:
- Journal Article
- Title:
- Evaluating Human Scoring Using Generalizability Theory. Issue 3 (2nd July 2020)
- Main Title:
- Evaluating Human Scoring Using Generalizability Theory
- Authors:
- Bimpeh, Yaw
Pointer, William
Smith, Ben Alexander
Harrison, Liz - Abstract:
- ABSTRACT: Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we apply generalizability theory (G theory) to data from routine monitoring of ratings to derive an estimate for inter-rater reliability. UK examinations use a combination of double or multiple rating for routine monitoring, creating a more complex design that consists of cross-pairing of raters and overlapping of raters for different groups of candidates or items. This sampling design is neither fully crossed nor is it nested. Each double- or multiple-scored item takes a different set of candidates, and the number of sampled candidates per item varies. Therefore, the standard G theory method, and its various forms for estimating inter-rater reliability, cannot be directly applied to the operational data. We propose a method that takes double or multiple rating data as given and analyzes the datasets at the item level in order to obtain more accurate and stable variance component estimates. We adapt the variance component in observed scores for an unbalanced one-facet crossed design with some missing observations. These estimates can be used to make inferences about the reliability of the entire scoring process. We illustrateABSTRACT: Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we apply generalizability theory (G theory) to data from routine monitoring of ratings to derive an estimate for inter-rater reliability. UK examinations use a combination of double or multiple rating for routine monitoring, creating a more complex design that consists of cross-pairing of raters and overlapping of raters for different groups of candidates or items. This sampling design is neither fully crossed nor is it nested. Each double- or multiple-scored item takes a different set of candidates, and the number of sampled candidates per item varies. Therefore, the standard G theory method, and its various forms for estimating inter-rater reliability, cannot be directly applied to the operational data. We propose a method that takes double or multiple rating data as given and analyzes the datasets at the item level in order to obtain more accurate and stable variance component estimates. We adapt the variance component in observed scores for an unbalanced one-facet crossed design with some missing observations. These estimates can be used to make inferences about the reliability of the entire scoring process. We illustrate the proposed method by applying it to real scoring data. … (more)
- Is Part Of:
- Applied measurement in education. Volume 33:Issue 3(2020)
- Journal:
- Applied measurement in education
- Issue:
- Volume 33:Issue 3(2020)
- Issue Display:
- Volume 33, Issue 3 (2020)
- Year:
- 2020
- Volume:
- 33
- Issue:
- 3
- Issue Sort Value:
- 2020-0033-0003-0000
- Page Start:
- 198
- Page End:
- 209
- Publication Date:
- 2020-07-02
- Subjects:
- Educational tests and measurements -- United States -- Periodicals
Educational tests and measurements -- Periodicals
Education -- Standards -- Periodicals
371.26 - Journal URLs:
- http://www.tandfonline.com/toc/hame20/current ↗
http://www.informaworld.com/smpp/title~db=all~content=t775653631~tab=issueslist ↗
http://www.tandfonline.com/ ↗
http://firstsearch.oclc.org ↗
http://www.erlbaum.com ↗ - DOI:
- 10.1080/08957347.2020.1750403 ↗
- Languages:
- English
- ISSNs:
- 0895-7347
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 1574.300000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22651.xml