Optimizing R with SparkR on a commodity cluster for biomedical research. (December 2016)

Record Type:: Journal Article
Title:: Optimizing R with SparkR on a commodity cluster for biomedical research. (December 2016)
Main Title:: Optimizing R with SparkR on a commodity cluster for biomedical research
Authors:: Sedlmayr, Martin
Würfl, Tobias
Maier, Christian
Häberle, Lothar
Fasching, Peter
Prokosch, Hans-Ulrich
Christoph, Jan
Abstract:: Highlights: R is a popular environment for clinical data analysis. It does not directly support big data workloads. Both, the Message Passing Interface (MPI) and SparkR allow to parallelize computational demanding workloads on clusters. SparkR offers elastic resources even on non-dedicated hardware and tight integration with Hadoop distributed services. SparkR requires minimal changes to original code in R in order to utilize parallel execution. Computation in SparkR scales better than with the Message Passing Interface (MPI) due to optimized data communication. Abstract: Background and Objectives: Medical researchers are challenged today by the enormous amount of data collected in healthcare. Analysis methods such as genome-wide association studies (GWAS) are often computationally intensive and thus require enormous resources to be performed in a reasonable amount of time. While dedicated clusters and public clouds may deliver the desired performance, their use requires upfront financial efforts or anonymous data, which is often not possible for preliminary or occasional tasks. We explored the possibilities to build a private, flexible cluster for processing scripts in R based on commodity, non-dedicated hardware of our department. Methods: For this, a GWAS-calculation in R on a single desktop computer, a Message Passing Interface (MPI)-cluster, and a SparkR-cluster were compared with regards to the performance, scalability, quality, and simplicity. Results: The original … (more)
Is Part Of:: Computer methods and programs in biomedicine. Volume 137(2016)
Journal:: Computer methods and programs in biomedicine
Issue:: Volume 137(2016)
Issue Display:: Volume 137, Issue 2016 (2016)
Year:: 2016
Volume:: 137
Issue:: 2016
Issue Sort Value:: 2016-0137-2016-0000
Page Start:: 321
Page End:: 328
Publication Date:: 2016-12
Subjects:: Computing methodologies -- Genome-wide association study -- Big data -- Cluster computing -- SparkR
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28
Journal URLs:: http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.cmpb.2016.10.006 ↗
Languages:: English
ISSNs:: 0169-2607
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 21087.xml