Testing for baseline differences in clinical trials

Henian Chen, Yuanyuan Lu, Nicole Slye


Reporting statistical tests for baseline measures of clinical trials does not make sense since the statistical significance is dependent on sample size, as a large trial can find significance in the same difference that a small trial did not find to be statistically significant. We use 3 published trials using the same baseline measures to provide the relationship between trial sample size and p value. For trial 1 sequential organ failure assessment (SOFA) score, p=0.01, 10.4±3.4 vs. 9.6±3.2, difference=0.8; p=0.007 for vasopressors, 83.0% vs. 72.6%. Trial 2 has SOFA score 11±3 vs. 12±3, difference=1, p=0.42. Trial 3 has vasopressors 73% vs. 83%, p=0.21. Based on trial 2, supine group has a mean of 12 and an SD of 3 for SOFA score, while prone group has a mean of 11 and an SD of 3 for SOFA score. The p values are 0.29850, 0.09877, 0.01940, 0.00094, 0.00005, and <0.00001 when n (per arm) is 20, 50, 100, 200, 300 and 400, respectively. Based on trial 3 information, the vasopressors percentages are 73.0% in the supine group vs. 83.0% in the prone group. The p values are 0.4452, 0.2274, 0.0878, 0.0158, 0.0031, and 0.0006 when n (per arm) is 20, 50, 100, 200, 300 and 400, respectively. Small trials provide larger p values than big trials for the same baseline differences. We cannot define the imbalance in baseline measures only based on these p values. There is no statistical basis for advocating the baseline difference tests


Baseline difference, Statistical significant testing, Randomization, Trial size

Full Text:



Moher D, Schulz S, Gotzsche P, Egger M. CONSORT 2010 explanation and elaboration: Updated guideline for reporting parallel group randomized trials. BMJ. 2010;340:c869.

Senn S. Testing for baseline balance in clinical trials. Stat Med. 1994;13(17):1715-26.

Altman D, Doré C. Randomisation and baseline comparisons in clinical trials. Lancet. 1990;335(8682):149-53.

Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet. 2000;355(9209):1064-9.

Wright N, Ivers N, Eldridge S, Taljaard M, Bremner S. A review of the use of covariates in cluster randomized trials uncovers marked discrepancies between guidance and practice. J Clin Epidemiol. 2015;68(6):603-9.

Schulz KF. Assessing the quality of randomization from reports of controlled trials published in obstetrics and gynecology journals. JAMA. 1994;272(2):125-8.

Knol M, Groenwold R, Grobbee D. P-values in baseline tables of randomised controlled trials are inappropriate but still common in high impact journals. Eur J Prev Cardiol. 2012;19(2):231-2.

Peterson RL, Tran M, Koffel J, Stovitz SD. Statistical testing of baseline differences in sports medicine RCTs: a systematic evaluation. BMJ Open Sport Exerc Med. 2017;3(1):e000228.

Boer MRD, Waterlander WE, Kuijper LD, Steenhuis IH, Twisk JW. Testing for baseline differences in randomized controlled trials: an unhealthy research behavior that is hard to eradicate. Int J Behav Nutr Phys Act. 2015;12:4.

Zhao W, Berger V. Imbalance control in clinical trial subject randomization—from philosophy to strategy. J Clin Epidemiol. 2018;101:116-8.

Altman DG. Comparability of Randomised Groups. Statistician. 1985;34(1):125.

Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat Med. 2002;21(19):2917-30.

Mutz DC, Pemantle R, Pham P. The perils of balance testing in experimental design: Messy analyses of clean data. Am Stat. 2018;73(1):32-42.

Sedgwick P. Randomised controlled trials: balance in baseline characteristics. BMJ. 2014;349:g5721.

Roberts C, Torgerson DJ. Understanding controlled trials: Baseline imbalance in randomised controlled trials. BMJ. 1999;319:185.

Voggenreiter G, Aufmkolk M, Stiletto RJ, Baacke MG, Waydhas C, Ose C, et al. Prone positioning improves oxygenation in post-traumatic lung injury - A prospective randomized trial. J Trauma. 2005;59(2):333-43.

Mancebo J, Fernández R, Blanch L, Rialp G, Gordo F, Ferrer M, et al. A multicenter trial of prolonged prone ventilation in severe acute respiratory distress syndrome. Am J Respir Crit Care Med. 2006;173(11):1233-9.

Guerin C, Reignier J, Richard J, Beuret P, Gacouin A, Boulain T, et al. Prone positioning in severe acute respiratory distress syndrome. N Engl J Med. 2013;368(23):2159-68.

Wang W, Ma Y, Huang Y, Chen H. Generalizability analysis for clinical trials: a simulation study. Stat Med. 2017;36:1523-31.