Sample size calculation for Mann-Whitney U test with five methods

Xiaoping Zhu


Background: Precise sample size estimation plays a vital role in the planning of a study specifically for medical treatment expenses that are expensive and studies that are of high risk.

Methods: Among a variety of sample size calculation methods for the nonparametric Mann-Whitney U test, five potential methods are selected for evaluation in this article. The evaluation of method performance is based on the results obtained from high precision Monte Carlo simulations.

Results: The sample size deviations (from the simulation ones) are performance indicators. The sum of the squared deviations over all scenarios is used as the criterion for ranking the five methods. For power comparisons, the percentage errors (relative to the simulated powers) are used. The effect size and target power both have large impacts on the minimum required sample sizes.

Conclusions: Based on the ranking criterion, Shieh's method has the best performance. Noether's method always overestimates the minimum required sample sizes but not too severe.


Mann-Whitney U test, Nonparametric, Sample size, Power, Monte Carlo simulation

Full Text:



Lehmann EL. Testing statistical hypotheses. New York: Chapman and Hall; 1959:10-369.

Siegel S, and Castellan NJ. Nonparametric statistics for the behavioural sciences. New York: MeGraw-Hill Inc; 1988:45-85.

Lehmann EL. Nonparametric: statistical methods based on ranks. New Jersey: Prentice Hall; 1975:87-98.

Noether GE. Sample size determination for some common nonparametric tests. J Amn Stat Assoc. 1987;82(398):645-7.

Wang H, Chen B, Chow S-C. Sample size determination based on rank tests in clinical trials. J Biopharma Stat. 2003;13(4):735-51.

Shieh G, Jan S, Randles RH. On power and sample size determinations for the Wilcoxon–Mann–Whitney test. J Nonparametric Stat. 2006;18(1):33-3.

Doll M, Klein I. Sample size analysis for two-sample linear rank tests. fau discussion papers in economics Friedrich-Alexander-Universität (FAU) Erlangen-Nürnberg, Institute for Economics. Nürnberg. 2006;218-35.

Dette H, Brien O, Timothy E. Efficient experimental design for the Behrens-fisher problem with application to bioassay. Am Stat. 2003;58(2):138-43.

Happ M, Bathke AC, Brunner E. Optimal sample size planning for the Wilcoxon-Mann-Whitney test. Stat Med. 2019;38(3):363-75.

Bürkner PC, Doebler P, Holling H. Optimal design of the Wilcoxon-Mann-Whitney-test. Biom J. 2017;59(1):25-40.

Ruymgaart FH. A unified approach to the asymptotic distribution theory of certain midrank statistics. In: Statistique non Parametrique Asymptotique. USA: Springer; 1980: 1-18.

Akritas MG, Arnold SF, Brunner E. Nonparametric hypotheses and rank statistics for unbalanced factorial designs. J Am Stat Assoc. 1997; 92(437): 258-65.

Zhao YD, Rahardja D, Qu Y. Sample size calculation for the Wilcoxon-Mann-Whitney test adjusting for ties. Stat Med. 2008;27(3):462-8.

A language and environment for statistical computing. Available at: https://www.R-project. org/. Accessed on 20 August 2020.

Van De WMA. Exact non-null distributions of rank statistics, communications in statistics. Simul Comput. 2001; 30:1011-29.

Mollan KR, Trumble IM, Reifeis SA, Ferrer O, Bay CP, Baldoni PL, et al. Exact power of the rank-sum test for a continuous variable. ArXiv. 2019;1901-7.

Al-Sunduqchi MS. Determining the appropriate sample size for inferences based on the Wilcoxon statistics. Available at: 1805.12249.pdf. Accessed on 20 August 2020.