A flexible I/O arbitration framework for netCDF‐based big data processing workflows on high‐end supercomputers. (15th May 2017)
- Record Type:
- Journal Article
- Title:
- A flexible I/O arbitration framework for netCDF‐based big data processing workflows on high‐end supercomputers. (15th May 2017)
- Main Title:
- A flexible I/O arbitration framework for netCDF‐based big data processing workflows on high‐end supercomputers
- Authors:
- Liao, Jianwei
Gerofi, Balazs
Lien, Guo‐Yuan
Miyoshi, Takemasa
Nishizawa, Seiya
Tomita, Hirofumi
Liao, Wei‐Keng
Choudhary, Alok
Ishikawa, Yutaka - Other Names:
- Lengauer Christian guestEditor.
Bougé Luc guestEditor.
Trystram Denis guestEditor.
Balaji Pavan guestEditor.
Leung Kai‐Cheung guestEditor. - Abstract:
- Summary: On the verge of the convergence between high‐performance computing and Big Data processing, it has become increasingly prevalent to deploy large‐scale data analytics workloads on high‐end supercomputers. Such applications often come in the form of complex workflows with various different components, assimilating data from scientific simulations as well as from measurements streamed from sensor networks, such as radars and satellites. For example, as part of the Flagship 2020 (post‐K) supercomputer project of Japan, RIKEN is investigating the feasibility of a highly accurate weather forecasting system that would provide a real‐time outlook for severe guerrilla rainstorms. One of the main performance bottlenecks of this application is the lack of efficient communication among workflow components, which currently takes place over the parallel file system.In this paper, we present an initial study of a direct communication framework designed for complex workflows that eliminates unnecessary file I/O among components. Specifically, we propose an I/O arbitration layer that provides direct parallel data transfer (both synchronous and asynchronous) among job components that rely on the netCDF interface for performing I/O operations. Our solution requires only minimal modifications to application code. Moreover, we propose a configuration file–based approach that allows users to specify the desired data transfer pattern among workflow components, offering a general solutionSummary: On the verge of the convergence between high‐performance computing and Big Data processing, it has become increasingly prevalent to deploy large‐scale data analytics workloads on high‐end supercomputers. Such applications often come in the form of complex workflows with various different components, assimilating data from scientific simulations as well as from measurements streamed from sensor networks, such as radars and satellites. For example, as part of the Flagship 2020 (post‐K) supercomputer project of Japan, RIKEN is investigating the feasibility of a highly accurate weather forecasting system that would provide a real‐time outlook for severe guerrilla rainstorms. One of the main performance bottlenecks of this application is the lack of efficient communication among workflow components, which currently takes place over the parallel file system.In this paper, we present an initial study of a direct communication framework designed for complex workflows that eliminates unnecessary file I/O among components. Specifically, we propose an I/O arbitration layer that provides direct parallel data transfer (both synchronous and asynchronous) among job components that rely on the netCDF interface for performing I/O operations. Our solution requires only minimal modifications to application code. Moreover, we propose a configuration file–based approach that allows users to specify the desired data transfer pattern among workflow components, offering a general solution for different application contexts. We present a preliminary evaluation of the proposed framework on the K Computer (running on up to 4800 compute nodes) using RIKEN's experimental weather forecasting workflow as a case study. … (more)
- Is Part Of:
- Concurrency and computation. Volume 29:Number 15(2017)
- Journal:
- Concurrency and computation
- Issue:
- Volume 29:Number 15(2017)
- Issue Display:
- Volume 29, Issue 15 (2017)
- Year:
- 2017
- Volume:
- 29
- Issue:
- 15
- Issue Sort Value:
- 2017-0029-0015-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2017-05-15
- Subjects:
- asynchronous transfer -- big data processing -- customizability -- netCDF -- parallel direct data transfer -- real time
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.4161 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 2890.xml