Performance analysis and optimization for scalable deployment of deep learning models for country‐scale settlement mapping on Titan supercomputer. (8th May 2019)
- Record Type:
- Journal Article
- Title:
- Performance analysis and optimization for scalable deployment of deep learning models for country‐scale settlement mapping on Titan supercomputer. (8th May 2019)
- Main Title:
- Performance analysis and optimization for scalable deployment of deep learning models for country‐scale settlement mapping on Titan supercomputer
- Authors:
- Kurte, Kuldeep
Sanyal, Jibonananda
Berres, Anne
Lunga, Dalton
Coletti, Mark
Yang, Hsiuhan Lexie
Graves, Daniel
Liebersohn, Benjamin
Rose, Amy - Abstract:
- Summary: This paper presents a scalable object detection workflow for detecting objects, such as settlements, from remotely sensed (RS) imagery. We have successfully deployed this workflow on Titan supercomputer and utilized it for the task of mapping human settlement at a country scale. The performance of various stages in the workflow was analyzed before making it operational. The workflow implemented various strategies to address issues such as suboptimal resource utilization and long‐tail effects due to unbalanced image workload, data loss due to runtime failures, and maximum wall‐time constraints imposed by Titan's job scheduling policy. A mean shift clustering–based static load balancing strategy was implemented, which partitions the image load such that each partition contained similar‐sized images. Furthermore, a checkpoint‐restart strategy was added in the workflow as a fault‐tolerance mechanism to prevent the data losses due to unforeseen runtime failures. The performance of the above‐mentioned strategies was observed in various scenarios, such as node failure, exceeding wall time, and successful completion. Using this workflow, we have processed an RS data set that has a spatial resolution of 0.31 m and is comprised of 685 675 km 2 of area of the Republic of Zambia in under six hours using 5426 nodes of the Titan supercomputer.
- Is Part Of:
- Concurrency and computation. Volume 31:Number 20(2019)
- Journal:
- Concurrency and computation
- Issue:
- Volume 31:Number 20(2019)
- Issue Display:
- Volume 31, Issue 20 (2019)
- Year:
- 2019
- Volume:
- 31
- Issue:
- 20
- Issue Sort Value:
- 2019-0031-0020-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2019-05-08
- Subjects:
- convolutional neural network -- deep learning -- fault tolerance -- HPC -- human settlement mapping -- load balancing
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.5305 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 11974.xml