This vignettes accompanies our recent manuscript ‘’Truncating the
Likelihood Allows Outlier Exclusion Without Overestimating the Evidence
in the Bayes Factor t-Test’’ (godmann2024TruncLikelihood?)
and shows how to use the RoBTT R package to estimate a
truncated Bayesian model-averaged independent samples \(t\)-test (TrBTT). TrBTT adapts the t-test
to researchers’ outlier handling and thus mitigates the unwanted side
effects of outlier exclusion on the inferences. For a general
introduction to the RoBTT package, see the Introduction
to RoBTT vignette.
Outliers can lead to biased analysis results. However, the widely applied approach of simply excluding extreme observations without changing the analysis is also not appropriate, as it often leads to inflated evidence. This vignette introduces a truncated version of the Bayesian model-averaged independent samples \(t\)-test and demonstrates an alternative way of handling outliers in a Bayesian hypothesis testing framework. TrBTT incorporates the Bayesian model-averaging approach with a truncated likelihood. As such, TrBTT offers a robust solution for conducting independent samples \(t\)-tests that are less susceptible to the influence of outlier.
The TrBTT truncates the likelihood identically to the truncation applied to data. As such, it overcomes the otherwise biased variance estimates due to outlier exclusion. It simultaneously model-averages across \(4\) different models;
For all models, the likelihood is adjusted according to the specified values. Inferences are based on a weighted average of each model’s predictive performance.
First, we ensure that the RoBTT package is installed and loaded into the R session:
We generate some example data to demonstrate the functionality of the test:
First, we demonstrate how to manually exclude outliers using specific cut-offs and then apply truncation to the likelihood function. It is possible to specify specific cut-offs for each group separately, as would be the case for instance with the box plot method for identifying outliers. Further, it is possible to define a cut-off that was applied to both groups, for instance when all response times slower than \(200\) ms and higher than \(1000\) ms should be excluded in both groups.
First, we apply the box plot method for excluding outliers and specify the cut-off range for each group:
# Identify outliers using boxplot statistics for each group
stats1 <- boxplot.stats(x1)
lower_whisker1 <- stats1$stats[1]
upper_whisker1 <- stats1$stats[5]
stats2 <- boxplot.stats(x2)
lower_whisker2 <- stats2$stats[1]
upper_whisker2 <- stats2$stats[5]
# Exclude outliers based on identified whiskers
x1_filtered <- x1[x1 >= lower_whisker1 & x1 <= upper_whisker1]
x2_filtered <- x2[x2 >= lower_whisker2 & x2 <= upper_whisker2]
# Define whiskers for truncated likelihood application
whisker1 <- c(lower_whisker1, upper_whisker1)
whisker2 <- c(lower_whisker2, upper_whisker2)We can then fit the truncated RoBTT:
# Fit the RoBTT model with truncation using the filtered data
fit1_trunc <- RoBTT(
  x1 = x1_filtered, x2 = x2_filtered,
  truncation = list(x1 = whisker1, x2 = whisker2),
  seed = 1, parallel = FALSE)We can summarize the fitted model using the summary()
function.
summary(fit1_trunc, group_estimates = TRUE)
#> Call:
#> RoBTT(x1 = x1_filtered, x2 = x2_filtered, truncation = list(x1 = whisker1, 
#>     x2 = whisker2), parallel = FALSE, seed = 1)
#> 
#> Robust Bayesian t-test
#> Components summary:
#>               Models Prior prob. Post. prob. Inclusion BF
#> Effect           2/4       0.500       0.319        0.468
#> Heterogeneity    2/4       0.500       0.171        0.207
#> 
#> Model-averaged estimates:
#>         Mean Median  0.025 0.975
#> delta -0.070  0.000 -0.442 0.008
#> rho    0.498  0.500  0.406 0.574
#> 
#> Model-averaged group parameter estimates:
#>            Mean Median  0.025 0.975
#> mu[1]     0.041  0.034 -0.151 0.278
#> mu[2]    -0.031 -0.022 -0.290 0.169
#> sigma[1]  1.055  1.047  0.906 1.258
#> sigma[2]  1.052  1.043  0.887 1.270The printed output is structured into three sections. First, the
Components summary table which contains the inclusion Bayes
factor for the presence of an effect and heterogeneity computed using
all specified models. Second, the Model-averaged estimates
table which contains the model-averaged posterior mean, median estimate,
and 95% central credible interval for the effect (Cohen’s d) and
variance allocation rho. Third, the
Model-averaged group parameter estimates table (generated
by setting the group_estimates = TRUE argument) which
summarizes the model-averaged mean and standard deviation estimates of
each group.
We can also summarize information about the specified models by
setting the type = "models" argument in the summary()
function.
summary(fit1_trunc, group_estimates = TRUE, type = "models")
#> Call:
#> RoBTT(x1 = x1_filtered, x2 = x2_filtered, truncation = list(x1 = whisker1, 
#>     x2 = whisker2), parallel = FALSE, seed = 1)
#> 
#> Robust Bayesian t-test
#> Models overview:
#>  Model     Distribution   Prior delta    Prior rho Prior prob. log(marglik)
#>      1 truncated normal        Spike(0) Spike(0.5)       0.250      -261.28
#>      2 truncated normal        Spike(0) Beta(1, 1)       0.250      -262.86
#>      3 truncated normal Cauchy(0, 0.71) Spike(0.5)       0.250      -262.04
#>      4 truncated normal Cauchy(0, 0.71) Beta(1, 1)       0.250      -263.62
#>  Post. prob. Inclusion BF
#>        0.564        3.884
#>        0.117        0.397
#>        0.264        1.078
#>        0.055        0.173This output contains a table summarizing the specifics for each model: The type of likelihood distribution, the prior distributions on the effect parameter, the prior distributions on the rho parameter, the prior model probabilities, the log marginal likelihoods, posterior model probabilities, and the inclusion Bayes factors.
Second, we can also specify the cut-off range for each group separately. Here, we specify identical cut-offs across groups:
# fit RoBTT with truncated likelihood
fit2_trunc  <- RoBTT(
  x1 = x1, x2 = x2, 
  truncation = list(x = cut_off),
  seed = 1, parallel = FALSE)The results can again be obtained using the summary()
function (see above).
The RoBTT package also allows specifying truncation
directly based on standard deviations, simplifying the process of
outlier handling. The function proceeds by excluding extreme
observations and truncating the likelihood accordingly. Note that the
analyst should not exclude outliers manually and then specify
sigma truncation, as the data would be truncated twice.
This is again possible for the same standard deviation value sigma to be applied to both groups, as well as to specify different standard deviations per group.
First, a cut-off range sigma for both groups:
# Fit the model with direct truncation based on standard deviations
fit1_trunc <- RoBTT(
  x1 = x1, x2 = x2,
  truncation = list(sigma = 2.5),
  seed = 1, parallel = FALSE)Second, a different standard deviation sigma for each group:
# Fit the model with direct truncation based on standard deviations
fit1_trunc <- RoBTT(
  x1 = x1, x2 = x2,
  truncation = list(sigma1 = 2, sigma2 = 2.5),
  seed = 1, parallel = FALSE)Just like before, the results can be obtained using the
summary() function.
This vignette demonstrated outlier handling with truncated Bayesian
model-averaged t-test implemented in the RoBTT R package.
For methodological background see (godmann2024TruncLikelihood?).