Burrows‐Wheeler post‐transformation with effective clustering and interpolative coding. (29th June 2020)
- Record Type:
- Journal Article
- Title:
- Burrows‐Wheeler post‐transformation with effective clustering and interpolative coding. (29th June 2020)
- Main Title:
- Burrows‐Wheeler post‐transformation with effective clustering and interpolative coding
- Authors:
- Niemi, Arto
Teuhola, Jukka - Abstract:
- Summary: Lossless compression methods based on the Burrows‐Wheeler transform (BWT) are regarded as an excellent compromise between speed and compression efficiency: they provide compression rates close to the PPM algorithms, with the speed of dictionary‐based methods. Instead of the laborious statistics‐gathering process used in PPM, the BWT reversibly sorts the input symbols, using as the sort key as many following characters as necessary to make the sort unique. Characters occurring in similar contexts are sorted close together, resulting in a clustered symbol sequence. Run‐length encoding and Move‐to‐Front (MTF) recoding, combined with a statistical Huffman or arithmetic coder, is then typically used to exploit the clustering. A drawback of the MTF recoding is that knowledge of the character that produced the MTF number is lost. In this paper, we present a new, competitive Burrows‐Wheeler posttransform stage that takes advantage of interpolative coding—a fast binary encoding method for integer sequences, being able to exploit clusters without requiring explicit statistics. We introduce a fast and simple way to retain knowledge of the run characters during the MTF recoding and use this to improve the clustering of MTF numbers and run‐lengths by applying reversible, stable sorting, with the run characters as sort keys, achieving significant improvement in the compression rate, as shown here by experiments on common text corpora.
- Is Part Of:
- Software, practice & experience. Volume 50:Number 9(2020)
- Journal:
- Software, practice & experience
- Issue:
- Volume 50:Number 9(2020)
- Issue Display:
- Volume 50, Issue 9 (2020)
- Year:
- 2020
- Volume:
- 50
- Issue:
- 9
- Issue Sort Value:
- 2020-0050-0009-0000
- Page Start:
- 1858
- Page End:
- 1874
- Publication Date:
- 2020-06-29
- Subjects:
- lossless compression -- Burrows‐Wheeler transform -- move‐to‐front -- interpolative coding
Computer software -- Periodicals
Computer programming -- Periodicals
Computer programs -- Periodicals
005.3 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/spe.2873 ↗
- Languages:
- English
- ISSNs:
- 0038-0644
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8321.453000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 24578.xml