AI4IO: A suite of AI-based tools for IO-aware scheduling. (May 2022)
- Record Type:
- Journal Article
- Title:
- AI4IO: A suite of AI-based tools for IO-aware scheduling. (May 2022)
- Main Title:
- AI4IO: A suite of AI-based tools for IO-aware scheduling
- Authors:
- Wyatt, Michael R
Herbein, Stephen
Gamblin, Todd
Taufer, Michela - Abstract:
- Traditional workload managers do not have the capacity to consider how IO contention can increase job runtime and even cause entire resource allocations to be wasted. Whether from bursts of IO demand or parallel file systems (PFS) performance degradation, IO contention must be identified and addressed to ensure maximum performance. In this paper, we present AI4IO (AI for IO), a suite of tools using AI methods to prevent and mitigate performance losses due to IO contention. AI4IO enables existing workload managers to become IO-aware. Currently, AI4IO consists of two tools: PRIONN and CanarIO. PRIONN predicts IO contention and empowers schedulers to prevent it. CanarIO mitigates the impact of IO contention when it does occur. We measure the effectiveness of AI4IO when integrated into Flux, a next-generation scheduler, for both small- and large-scale IO-intensive job workloads. Our results show that integrating AI4IO into Flux improves the workload makespan up to 6.4%, which can account for more than 18, 000 node-h of saved resources per week on a production cluster in our large-scale workload.
- Is Part Of:
- International journal of high performance computing applications. Volume 36:Number 3(2022)
- Journal:
- International journal of high performance computing applications
- Issue:
- Volume 36:Number 3(2022)
- Issue Display:
- Volume 36, Issue 3 (2022)
- Year:
- 2022
- Volume:
- 36
- Issue:
- 3
- Issue Sort Value:
- 2022-0036-0003-0000
- Page Start:
- 370
- Page End:
- 387
- Publication Date:
- 2022-05
- Subjects:
- High performance computing -- workload managers -- parallel file systems -- job scheduler -- IO contention -- IO prediction -- IO-intensive workloads
High performance computing -- Periodicals
Supercomputers -- Periodicals
004.1105 - Journal URLs:
- http://hpc.sagepub.com ↗
http://www.uk.sagepub.com/home.nav ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1177/10943420221079765 ↗
- Languages:
- English
- ISSNs:
- 1094-3420
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20614.xml