Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16. (11th February 2022)
- Record Type:
- Journal Article
- Title:
- Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16. (11th February 2022)
- Main Title:
- Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16
- Authors:
- Klöwer, Milan
Hatfield, Sam
Croci, Matteo
Düben, Peter D.
Palmer, Tim N. - Abstract:
- Abstract: Most Earth‐system simulations run on conventional central processing units in 64‐bit double precision floating‐point numbers Float64, although the need for high‐precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer, is based on A64FX microprocessors, which also support the 16‐bit low‐precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16‐bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bits. The precision‐critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed precision with 16 and 32‐bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6 × 10 −5 to 65, 504. We develop the analysis‐number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of up to 3.8x on A64FX with Float16. Adding a compensated time integration, speedups reach up to 3.6x. Although ShallowWaters.jl is simplified compared to large Earth‐system models, it shares essential algorithms and therefore showsAbstract: Most Earth‐system simulations run on conventional central processing units in 64‐bit double precision floating‐point numbers Float64, although the need for high‐precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer, is based on A64FX microprocessors, which also support the 16‐bit low‐precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16‐bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bits. The precision‐critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed precision with 16 and 32‐bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6 × 10 −5 to 65, 504. We develop the analysis‐number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of up to 3.8x on A64FX with Float16. Adding a compensated time integration, speedups reach up to 3.6x. Although ShallowWaters.jl is simplified compared to large Earth‐system models, it shares essential algorithms and therefore shows that 16‐bit calculations are indeed a competitive way to accelerate Earth‐system simulations on available hardware. Plain Language Summary: Computational performance is a major limitation to improved weather and climate forecasts. Most Earth‐system simulations run on conventional computers with every calculation being performed with 64 bits at very high precision. However, the need for high‐precision calculations in the presence of large uncertainties of the climate system has been questioned. We present results with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16‐bit precision, essentially making every calculation only to four digits accurate. Furthermore, only numbers between 6 × 10 −5 and 65, 504 are representable and we systemically rescale all calculations to not exceed this range, making use of 97% of all representable numbers within. Simulations with ShallowWaters.jl performed on modern hardware are almost 4x faster than the conventional high‐precision calculations. Although ShallowWaters.jl is simplified compared to large Earth‐system models, it shares essential algorithms and therefore shows that 16‐bit calculations are indeed a competitive way to accelerate Earth‐system simulations on available hardware. Key Points: The first fluid circulation model entirely based on 16‐bit instead of conventional 64‐bit calculations approaches 4x speedups on hardware Systematic rescaling squeezes all calculations into the very limited range of Float16, making use of 97% of the available numbers Compensated summation in the precision‐critical time integration minimizes rounding errors from Float16 and is faster than mixed precision … (more)
- Is Part Of:
- Journal of advances in modeling earth systems. Volume 14:Number 2(2022)
- Journal:
- Journal of advances in modeling earth systems
- Issue:
- Volume 14:Number 2(2022)
- Issue Display:
- Volume 14, Issue 2 (2022)
- Year:
- 2022
- Volume:
- 14
- Issue:
- 2
- Issue Sort Value:
- 2022-0014-0002-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2022-02-11
- Subjects:
- low precision -- floating‐point numbers -- climate models -- rounding errors -- hardware acceleration
Geological modeling -- Periodicals
Climatology -- Periodicals
Geochemical modeling -- Periodicals
551.5011 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1942-2466 ↗
http://onlinelibrary.wiley.com/ ↗
http://adv-model-earth-syst.org/ ↗ - DOI:
- 10.1029/2021MS002684 ↗
- Languages:
- English
- ISSNs:
- 1942-2466
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20745.xml