To Remove or not to Remove: the Impact of Outlier Handling on Significance Testing in Testosterone Data

Pollet, Thomas and van der Meij, Leander (2017) To Remove or not to Remove: the Impact of Outlier Handling on Significance Testing in Testosterone Data. Adaptive Human Behavior and Physiology, 3 (1). pp. 43-60. ISSN 2198-7335

Preview

Text
10.1007%2Fs40750-016-0050-z.pdf - Published Version
Available under License Creative Commons Attribution 4.0.
Download (456kB) | Preview

Official URL: https://doi.org/10.1007/s40750-016-0050-z

Abstract

Outlier removal is common in hormonal research. Here we investigated to what extent removing outliers in hormonal data leads to divergent statistical conclusions. We first show that the most common outlier detection rule is based on a number of standard deviations (SD) from the mean. Next, we used simulations to examine the degree to which statistical conclusions diverge when a test with outlier exclusion yields a statistically significant result whereas the test with outlier inclusion did not, or vice versa (at p = .05). Simulations were run in duplicate for independent samples t-tests and repeated measures ANOVA designs, and based on real testosterone (T) data and a theoretical gamma distribution of T data. We ran simulations for different sample sizes (30 to 100) and outlier removal rules (2.5 SD and 3 SD). For significant t-tests, we found that in between 14 % to 55 % of the significant cases a test with outlier exclusion yielded a statistically significant result whereas the test with outlier inclusion did not, or vice versa (median p difference: .03–.06). For significant repeated measures ANOVAs, we found that in between 7 % to 28 % of significant cases a test where outlier exclusion yielded a statistically significant result whereas the test with outlier inclusion did not, or vice versa (median p difference: .01–.03). When reporting any test that would lead to a statistically significant result (either the test with inclusion or exclusion of outliers (or both)), in between 5.15 % and 6.89 % of the independent sample t-tests were statistically significant, and for the repeated measures ANOVA design this was between 6.32 % and 7.62 % of the tests. Our results suggest that outlier handling can have a substantial impact on significance testing. We suggest several potential solutions for handling outliers and we argue for a careful assessment of handling outliers in hormonal data.

Item Type:	Article
Uncontrolled Keywords:	Sex hormones, Statistical design, p value, Outlier handling, Statistical simulation
Subjects:	C800 Psychology G300 Statistics
Department:	Faculties > Health and Life Sciences > Psychology
Depositing User:	Becky Skoyles
Date Deposited:	21 Sep 2017 11:14
Last Modified:	01 Aug 2021 07:30
URI:	http://nrl.northumbria.ac.uk/id/eprint/31924

Actions (login required)

View Item

Downloads

Downloads per month over past year

View more statistics