Using score distributions to compare statistical significance tests for information retrieval evaluation. (5th April 2019)