Performance analysis of parallel gravitational N-body codes on large GPU clusters. (January 2016)