Exploring the composition of the searchable web: a corpus-based taxonomy of web registers. Issue 1 (April 2015)

Record Type:: Journal Article
Title:: Exploring the composition of the searchable web: a corpus-based taxonomy of web registers. Issue 1 (April 2015)
Main Title:: Exploring the composition of the searchable web: a corpus-based taxonomy of web registers
Authors:: Biber, Douglas
Egbert, Jesse
Davies, Mark
Abstract:: Abstract : One major challenge for Web-As-Corpus research is that a typical Web search provides little information about the register of the documents that are searched. Previous research has attempted to address this problem (e.g., through the Automatic Genre Identification initiative), but with only limited success. As a result, we currently know surprisingly little about the distribution of registers on the web. In this study, we tackle this problem through a bottom-up user-based investigation of a large, representative corpus of web documents. We base our investigation on a much larger corpus than those used in previous research (48, 571 web documents), and obtained through random sampling from across the full range of documents that are publically available on the searchable web. Instead of relying on individual expert coders, we recruit typical end-users of the Web for register coding, with each document in the corpus coded by four different raters. End-users identify basic situational characteristics of each web document, coded in a hierarchical manner. Those situational characteristics lead to general register categories, which eventually lead to lists of specific sub-registers. By working through a hierarchical decision tree, users are able to identify the register category of most Internet texts with a high degree of reliability. After summarising our methodological approach, this paper documents the register composition of the searchable web. Narrative registers … (more)
Is Part Of:: Corpora. Volume 10:Issue 1(2015)
Journal:: Corpora
Issue:: Volume 10:Issue 1(2015)
Issue Display:: Volume 10, Issue 1 (2015)
Year:: 2015
Volume:: 10
Issue:: 1
Issue Sort Value:: 2015-0010-0001-0000
Page Start:: 11
Page End:: 45
Publication Date:: 2015-04
Subjects:: hybrid registers -- informational registers -- Internet language -- Mechanical Turk -- narrative -- opinion -- Web-As-Corpus -- web registers
Corpora (Linguistics) -- Periodicals
410.188
Journal URLs:: http://www.euppublishing.com/journal/cor ↗
http://www.euppublishing.com/journals ↗
DOI:: 10.3366/cor.2015.0065 ↗
Languages:: English
ISSNs:: 1749-5032
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 5036.xml