7bgzf: Replacing samtools bgzip deflation for archiving and real-time compression. (April 2020)
- Record Type:
- Journal Article
- Title:
- 7bgzf: Replacing samtools bgzip deflation for archiving and real-time compression. (April 2020)
- Main Title:
- 7bgzf: Replacing samtools bgzip deflation for archiving and real-time compression
- Authors:
- Yamada, Taiju
- Abstract:
- Highlights: A suite of DEFLATE algorithms called 7bgzf was developed. Wide range of options to compress sequence data while preserving interoperability. Direct htslib integration viable by replacing bgzf_compress via LD_PRELOAD scheme. VCF conversion of bcftools into BCF varied in compression ratio and speed. Abstract: Background: Genomic sequence data are not only massive but also increasing rapidly every day; therefore, it is essential to compress such data for sharing. Though there are some specific compressors, they lack interoperability. In this study, a SAMtools bgzip variant named 7bgzf has been developed, incorporating several compression and deflation algorithms other than the widely used zlib algorithm. An extensive benchmarking study has been carried out with available data compression software. Results: On both x64 and ARM machines, igzip performed very rapidly. For high compression, using libdeflate on the x64 platform achieved high compression with tolerable speed loss. Conclusions: Based on appropriate algorithm selection, the proposed compression method performed better than the original bgzip method while maintaining interoperability with existing software. Therefore, this software is useful for both distribution of genomic sequence archives and real-time compression in mobile computing.
- Is Part Of:
- Computational biology and chemistry. Volume 85(2020)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 85(2020)
- Issue Display:
- Volume 85, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 85
- Issue:
- 2020
- Issue Sort Value:
- 2020-0085-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-04
- Subjects:
- BAM Binary sequence Alignment Map -- BGZF Blocked GNU Zip Format -- VCF Variant Call Format -- BCF Binary variant Call Format
Next generation sequencer -- Deflation -- Samtools
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2020.107207 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 13551.xml