Validating UTF‐8 in less than one instruction per byte. (29th October 2020)
- Record Type:
- Journal Article
- Title:
- Validating UTF‐8 in less than one instruction per byte. (29th October 2020)
- Main Title:
- Validating UTF‐8 in less than one instruction per byte
- Authors:
- Keiser, John
Lemire, Daniel - Abstract:
- Abstract: The majority of text is stored in UTF‐8, which must be validated on ingestion. We present the lookup algorithm, which outperforms UTF‐8 validation routines used in many libraries and languages by more than 10 times using commonly available single‐instruction‐multiple‐data instructions. To ensure reproducibility, our work is freely available as open source software.
- Is Part Of:
- Software, practice & experience. Volume 51:Number 5(2021)
- Journal:
- Software, practice & experience
- Issue:
- Volume 51:Number 5(2021)
- Issue Display:
- Volume 51, Issue 5 (2021)
- Year:
- 2021
- Volume:
- 51
- Issue:
- 5
- Issue Sort Value:
- 2021-0051-0005-0000
- Page Start:
- 950
- Page End:
- 964
- Publication Date:
- 2020-10-29
- Subjects:
- character encoding -- text processing -- Unicode -- vectorization
Computer software -- Periodicals
Computer programming -- Periodicals
Computer programs -- Periodicals
005.3 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/spe.2920 ↗
- Languages:
- English
- ISSNs:
- 0038-0644
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8321.453000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 16194.xml