Efficient POSIX submatch extraction on nondeterministic finite automata. (18th October 2020)
- Record Type:
- Journal Article
- Title:
- Efficient POSIX submatch extraction on nondeterministic finite automata. (18th October 2020)
- Main Title:
- Efficient POSIX submatch extraction on nondeterministic finite automata
- Authors:
- Borsotti, Angelo
Trofimovich, Ulya - Abstract:
- Summary: In this paper we study the performance of POSIX submatch extraction algorithms based on nondeterministic finite automata (NFA). We propose an algorithm that combines Laurikari tagged NFA and extended Okui‐Suzuki disambiguation. The algorithm works in worst‐case O ( n m 2 t ) time and O ( m 2 ) space (including preprocessing), where n is the length of input, m is the size of the regular expression with bounded repetition expanded and t is the number of capturing groups and subexpressions that contain them. On real‐world benchmarks our algorithm performs close to the O ( n m t ) complexity of leftmost‐greedy matching, although on artificial benchmarks it can be significantly slower. We propose a lazy version of the algorithm that runs much faster, but requires O ( n m 2 ) space. We show that the Kuklewicz algorithm is slower in practice, and the backward matching algorithm proposed by Cox is incorrect.
- Is Part Of:
- Software, practice & experience. Volume 51:Number 2(2021)
- Journal:
- Software, practice & experience
- Issue:
- Volume 51:Number 2(2021)
- Issue Display:
- Volume 51, Issue 2 (2021)
- Year:
- 2021
- Volume:
- 51
- Issue:
- 2
- Issue Sort Value:
- 2021-0051-0002-0000
- Page Start:
- 159
- Page End:
- 192
- Publication Date:
- 2020-10-18
- Subjects:
- finite‐state automata -- parsing -- POSIX -- regular expressions -- submatch extraction
Computer software -- Periodicals
Computer programming -- Periodicals
Computer programs -- Periodicals
005.3 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/spe.2881 ↗
- Languages:
- English
- ISSNs:
- 0038-0644
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8321.453000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 15391.xml