A CUDA fast multipole method with highly efficient M2L far field evaluation. (January 2021)