Scalable critical-path analysis and optimization guidance for hybrid MPI-CUDA applications. (November 2017)