Secure large-scale genome data storage and query. (October 2018)
- Record Type:
- Journal Article
- Title:
- Secure large-scale genome data storage and query. (October 2018)
- Main Title:
- Secure large-scale genome data storage and query
- Authors:
- Chen, Luyao
Aziz, Md Momin
Mohammed, Noman
Jiang, Xiaoqian - Abstract:
- Highlights: We propose a method utilizing graph database system to store and allow computations on real-world genome dataset in a privacy preserving manner. A novel indexing scheme is proposed on such database to make the secure query operations more efficient. We test the proposed approach along with the corresponding indexing scheme on a large-scale genome dataset containing 735; 317 human SNPs ( ∼ 200 GB data). Experimental results show that it takes less than a minute for a query compared to best-known attempt where it required around 7 min. Abstract: Background and Objective: Cloud computing plays a vital role in big data science with its scalable and cost-efficient architecture. Large-scale genome data storage and computations would benefit from using these latest cloud computing infrastructures, to save cost and speedup discoveries. However, due to the privacy and security concerns, data owners are often disinclined to put sensitive data in a public cloud environment without enforcing some protective measures. An ideal solution is to develop secure genome database that supports encrypted data deposition and query. Methods: Nevertheless, it is a challenging task to make such a system fast and scalable enough to handle real-world demands providing data security as well. In this paper, we propose a novel, secure mechanism to support secure count queries on an open source graph database (Neo4j) and evaluated the performance on a real-world dataset of around 735, 317Highlights: We propose a method utilizing graph database system to store and allow computations on real-world genome dataset in a privacy preserving manner. A novel indexing scheme is proposed on such database to make the secure query operations more efficient. We test the proposed approach along with the corresponding indexing scheme on a large-scale genome dataset containing 735; 317 human SNPs ( ∼ 200 GB data). Experimental results show that it takes less than a minute for a query compared to best-known attempt where it required around 7 min. Abstract: Background and Objective: Cloud computing plays a vital role in big data science with its scalable and cost-efficient architecture. Large-scale genome data storage and computations would benefit from using these latest cloud computing infrastructures, to save cost and speedup discoveries. However, due to the privacy and security concerns, data owners are often disinclined to put sensitive data in a public cloud environment without enforcing some protective measures. An ideal solution is to develop secure genome database that supports encrypted data deposition and query. Methods: Nevertheless, it is a challenging task to make such a system fast and scalable enough to handle real-world demands providing data security as well. In this paper, we propose a novel, secure mechanism to support secure count queries on an open source graph database (Neo4j) and evaluated the performance on a real-world dataset of around 735, 317 Single Nucleotide Polymorphisms (SNPs). In particular, we propose a new tree indexing method that offers constant time complexity (proportion to the tree depth), which was the bottleneck of existing approaches. Results: The proposed method significantly improves the runtime of query execution compared to the existing techniques. It takes less than one minute to execute an arbitrary count query on a dataset of 212 GB, while the best-known algorithm takes around 7 min. Conclusions: The outlined framework and experimental results show the applicability of utilizing graph database for securely storing large-scale genome data in untrusted environment. Furthermore, the crypto-system and security assumptions underlined are much suitable for such use cases which be generalized in future work. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 165(2018)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 165(2018)
- Issue Display:
- Volume 165, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 165
- Issue:
- 2018
- Issue Sort Value:
- 2018-0165-2018-0000
- Page Start:
- 129
- Page End:
- 137
- Publication Date:
- 2018-10
- Subjects:
- Secure genome data storage -- Graph database -- Secure computation on genome data -- Homomorphic encryption -- Genome data storage Neo4j
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2018.08.007 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 7980.xml