Using Lucene to index and search the digitized 1940 US Census. (6th March 2014)
- Record Type:
- Journal Article
- Title:
- Using Lucene to index and search the digitized 1940 US Census. (6th March 2014)
- Main Title:
- Using Lucene to index and search the digitized 1940 US Census
- Authors:
- Diesendruck, Liana
Kooper, Rob
Marini, Luigi
McHenry, Kenton
Wilkins‐Diehr, Nancy
Majumdar, Amit - Abstract:
- <abstract abstract-type="main" id="cpe3250-abs-0001"> <title>SUMMARY</title> <p id="cpe3250-para-0001">An improved approach toward enabling search capabilities over large digitized document archives is described, in which Lucene indices were incorporated in a framework developed to provide automatic searchable access to the 1940 US Census, a collection composed of digitized handwritten forms. As an alternative to trying to recognize the handwritten text in the images, Word Spotting feature vectors are used to describe each cell's content. Instead of querying the system using regular ASCII text, any query is rendered as an image, and a ranked list of matching results is presented to the user. Among other preprocessing steps required by the framework, an index must be compiled to provide fast access to the feature vectors. The advantages and drawbacks of using Lucene to index these vectors instead of other indexing methods are discussed in light of the challenges confronted when dealing with digitized document collections of considerable size. Copyright © 2014 John Wiley & Sons, Ltd.</p> </abstract>
- Is Part Of:
- Concurrency and computation. Volume 26:Number 13(2014:Sep.)
- Journal:
- Concurrency and computation
- Issue:
- Volume 26:Number 13(2014:Sep.)
- Issue Display:
- Volume 26, Issue 13 (2014)
- Year:
- 2014
- Volume:
- 26
- Issue:
- 13
- Issue Sort Value:
- 2014-0026-0013-0000
- Page Start:
- 2167
- Page End:
- 2177
- Publication Date:
- 2014-03-06
- Subjects:
- Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.3250 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 3072.xml