A Reproducible IT-Blog Corpus. (22nd July 2021)
- Record Type:
- Journal Article
- Title:
- A Reproducible IT-Blog Corpus. (22nd July 2021)
- Main Title:
- A Reproducible IT-Blog Corpus
- Authors:
- Barbaresi, Adrien
Pohlmann, Jens - Abstract:
- The dataset comprises text and metadata extracted from several hundred IT-blogs and websites, along with a method to duplicate the data by updating its contents and downloading it to the user's local machine. The targets have been hand-picked with the intention to represent the discourse on blogs and websites dedicated to questions at the intersection of technology and society from Germany and the United States of America. The texts have been retrieved by web crawling techniques. The resulting corpus is accessible through a search platform and also reproducible with freely accessible descriptors and software.
- Is Part Of:
- Journal of open humanities data. Volume 7(2021)
- Journal:
- Journal of open humanities data
- Issue:
- Volume 7(2021)
- Issue Display:
- Volume 7, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 7
- Issue:
- 2021
- Issue Sort Value:
- 2021-0007-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-07-22
- Subjects:
- web blogs -- corpus linguistics -- internet policy -- discourse analysis -- public discussion -- freedom of expression
Humanities -- Periodicals
001.3 - Journal URLs:
- http://openhumanitiesdata.metajnl.com/ ↗
- DOI:
- 10.5334/johd.35 ↗
- Languages:
- English
- ISSNs:
- 2059-481X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD Digital store
- Ingest File:
- 16577.xml