Optimising unicode regular expression evaluation with previews. (15th September 2016)
- Record Type:
- Journal Article
- Title:
- Optimising unicode regular expression evaluation with previews. (15th September 2016)
- Main Title:
- Optimising unicode regular expression evaluation with previews
- Authors:
- Chivers, Howard
- Abstract:
- Summary: The jsre regular expression library was designed to provide fast matching of complex expressions over large input streams using user‐selectable character encodings. An established design approach was used: a simulated non‐deterministic automaton (NFA) implemented as a virtual machine, avoiding exponential cost functions in either space or time. A deterministic automaton (DFA) was chosen as a general dispatching mechanism for Unicode character classes, and this also provided the opportunity to use compact DFAs in various optimization strategies. The result was the development of a regular expression Preview which provides a summary of all the matches possible from a given point in a regular expression in a form that can be implemented as a compact DFA and can be used to further improve the performance of the standard NFA simulation algorithm. This paper formally defines a preview and describes and evaluates several optimizations using this construct. They provide significant speed improvements accrued from fast scanning of anchor positions, avoiding retesting of repeated strings in unanchored searches and efficient searching of multiple alternate expressions which in the case of keyword searching has a time complexity which is logarithmic in the number of words to be searched. Copyright © 2016 John Wiley & Sons, Ltd.
- Is Part Of:
- Software, practice & experience. Volume 47:Number 5(2017)
- Journal:
- Software, practice & experience
- Issue:
- Volume 47:Number 5(2017)
- Issue Display:
- Volume 47, Issue 5 (2017)
- Year:
- 2017
- Volume:
- 47
- Issue:
- 5
- Issue Sort Value:
- 2017-0047-0005-0000
- Page Start:
- 669
- Page End:
- 688
- Publication Date:
- 2016-09-15
- Subjects:
- regular expression -- finite automaton -- non‐deterministic -- optimization -- searching -- forensics
Computer software -- Periodicals
Computer programming -- Periodicals
Computer programs -- Periodicals
005.3 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/spe.2436 ↗
- Languages:
- English
- ISSNs:
- 0038-0644
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8321.453000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 2340.xml