Reliable access to massive restricted texts: Experience‐based evaluation. (5th April 2019)
- Record Type:
- Journal Article
- Title:
- Reliable access to massive restricted texts: Experience‐based evaluation. (5th April 2019)
- Main Title:
- Reliable access to massive restricted texts: Experience‐based evaluation
- Authors:
- Peng, Zong
Plale, Beth - Other Names:
- Wu Yulei guestEditor.
Yan Zheng guestEditor.
Zhao Zhiwei guestEditor.
Al‐Dubai Ahmed guestEditor.
Li Tonglin guestEditor.
Xie Bing guestEditor.
Zhang Boyu guestEditor. - Abstract:
- Summary: Libraries are seeing growing numbers of digitized textual corpora that frequently come with restrictions on their content. Computational analysis corpora that are large, while of interest to scholars, can be cumbersome because of the combination of size, granularity of access, and access restrictions. Efficient management of such a collection for general access especially under failures depends on the primary storage system. In this paper, we identify the requirements of managing for computational analysis a massive text corpus and use it as basis to evaluate candidate storage solutions. The study based on the 5.9 billion page collection of the HathiTrust digital library. Our findings led to the choice of Cassandra 3.x for the primary back end store, which is currently in deployment in the HathiTrust Research Center.
- Is Part Of:
- Concurrency and computation. Volume 32:Number 16(2020)
- Journal:
- Concurrency and computation
- Issue:
- Volume 32:Number 16(2020)
- Issue Display:
- Volume 32, Issue 16 (2020)
- Year:
- 2020
- Volume:
- 32
- Issue:
- 16
- Issue Sort Value:
- 2020-0032-0016-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2019-04-05
- Subjects:
- big textual data -- data storage -- polyglot -- restricted data
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.5255 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 13563.xml