R package bmabart is used for Bayesian Mediation Analysis based on the Bayesian Additive Regression Trees (2025, 2022). The mediation effect (ME) refers to the effect conveyed by intervening variables to an observed relationship between an exposure (predictor) and a response variable (outcome). In this package, the exposure is also called the predictor, the intervening variables (IV) are called mediators or confounders depending on the conceptual models. The ME include total effect, direct effect, and indirect effects from different IVs. The effects are defined in Yu et al. (2014). The algorithms on how to make inferences on MEs and how to explain the effects are described in Yu et al. (2025).
To use the R package bmabart, we first install the package in R
(install.packages("bmabart")
) and load it.
The data_org function is used for read in exposure, IVs, outcomes,
and covariates, and transform them into formats needed for BART fitting.
The transformation including binarize the categorical variables, and
identify the type (e.g. binary, catecorical or continuous) of the input
variables, exposure, IVs, outcomes, and covariates for IVs and outcomes
separately. Users don’t have to run
data_org'' separately. The function
bma.bart’’ will call the
function
data_org'' first for the mediation analysis. The output of the organized data is saved in the
bma.bart’’
object as ``data0’’.
##Bayesian Mediation Analysis and Interpretation
We use bma.bart function to make inferences on the ME. And then the summary function helps summarize the estimation results from the bma.bart object. In the following code, the exposure is sex, weight_behavior[,3], the outcome is overweigh (weight_behavior[,15], yes or no), and potential IVs are weight_behavior[,c(2,4:14)]. Since the outcome is binary, we define the reference outcome as 0 (not over-weighted, refy=0). The reference group for the exposure is female (predref=``F’’). The number of burn-ins in the MCMC process is 10 (nskip=10) and the number of posterior samples is set at 100 (ndpost=100). In practice, we need to increase the nskip and ndpost of the final model to make sure the convergence of the MCMC process.
data(weight_behavior)
try0= bma.bart(pred=weight_behavior[,3], m=weight_behavior[,c(2,4:14)],
y=weight_behavior[,15], refy = 0, predref = "F",nskip=10,ndpost=100)
summary(try0)
#>
#> For Predictor1
#> Estimated Relative Effects for Method 3:
summary(try0)
#>
#> For Predictor1
#> Estimated Relative Effects for Method 3:
#> DE age race numpeople car gotosch snack tvhours
#> mean 0.4148 0.6237 0.0085 0.1240 -0.0792 0.0205 0.0325 -0.0749
#> sd 2.4702 3.5695 0.5190 0.8637 0.8499 0.2167 0.8693 0.2700
#> q_0.025 -7.4158 -0.9232 -1.0292 -0.1252 -0.8587 -0.1408 -0.6115 -0.6540
#> q_0.25 0.5404 -0.1066 -0.0373 0.1174 -0.2469 -0.0276 -0.1471 -0.2072
#> q_0.5 0.7735 -0.0051 0.0224 0.1979 -0.1211 -0.0061 -0.0593 -0.0478
#> q_0.75 1.1682 0.1563 0.0937 0.2586 0.0159 0.0201 0.0004 0.0693
#> q_0.975 2.6358 9.4966 0.6166 0.4685 0.3417 0.2866 1.2573 0.4439
#> cmpthours cellhours sports exercises sweat
#> mean 0.1377 -0.0437 -0.0832 0.0292 0.0141
#> sd 0.9685 0.4283 1.0952 0.8206 0.4033
#> q_0.025 -0.4673 -0.5565 -3.3919 -1.5987 -0.5367
#> q_0.25 -0.0042 -0.1194 0.0476 -0.0190 -0.0873
#> q_0.5 0.1015 -0.0185 0.0955 0.0362 0.0131
#> q_0.75 0.1983 0.1018 0.1621 0.0961 0.0729
#> q_0.975 0.8752 0.3173 0.5477 1.9382 0.3752
By default, the summary function displays the estimated relative effects
using Method 3, the G-computational method. When using method=2, the
displayed results will be from the product method: the indirect effects
are calculated as the product of the changing rates from the exposure to
the mediator, and the changing rate from the mediator to the outcome.
``method=4’’ is for G-computation with non-parametric method (binary
exposures only).
The relative effects is calculated as the mediation effect divided by the total effect. In general, if the total effect is not significantly different from zero, the relative effect will be very variant since the denominator is a number close to zero. In that scenario, the relative effect is not very meaningful. Use the argument RE=F will give the inference summary for the ME directly.
summary(try0,method=2,RE=F)
#>
#> For Predictor1
#> Estimated Effects for Method 2:
#> TE DE age race numpeople car gotosch snack tvhours
#> mean -0.0376 -0.0356 0.0017 -0.0003 0 0 0.0002 0.0019 0
#> sd 0.0208 0.0191 0.0053 0.0026 0 0 0.0019 0.0029 0
#> q_0.025 -0.0705 -0.0714 -0.0083 -0.0048 0 0 -0.0042 -0.0039 0
#> q_0.25 -0.0536 -0.0471 -0.0007 -0.0023 0 0 -0.0005 0.0001 0
#> q_0.5 -0.0404 -0.0378 0.0014 -0.0002 0 0 0.0003 0.0019 0
#> q_0.75 -0.0250 -0.0233 0.0043 0.0012 0 0 0.0013 0.0039 0
#> q_0.975 0.0088 0.0058 0.0127 0.0048 0 0 0.0038 0.0071 0
#> cmpthours cellhours sports exercises sweat
#> mean 0 0 -0.0084 0.0029 0
#> sd 0 0 0.0041 0.0046 0
#> q_0.025 0 0 -0.0174 -0.0044 0
#> q_0.25 0 0 -0.0107 0.0000 0
#> q_0.5 0 0 -0.0073 0.0019 0
#> q_0.75 0 0 -0.0057 0.0058 0
#> q_0.975 0 0 -0.0020 0.0133 0
Interpretation: Using method 2, the above table shows the estimated
total effect (TE), direct effect (DE), and indirect effect from each IV
between the exposure, sex, and the outcome, overweigh (yes or no). The
posterior mean (mean), standard deviation (sd), and quantiles of the
posterior distributions are listed in the table. By default, the
quantiles are set at quant=c(0.025, 0.25, 0.5, 0.75,0.975). The list can
be changed at the summary function using the quant argument.
In addition, the figure shows the estimates with the bars indicate the credible sets of the corresponding estimates. One can set plot=FALSE, if the figure is not needed.
Covariates for outcomes are variables that can explain the outcomes but not relate to the exposure variable. We can include covarites for outcomes in the analysis by adding the argument cova. In the following codes, ageis used as a covariate for exploring the gender differences in bmi.
try1= bma.bart(pred=weight_behavior[,3], m=weight_behavior[,c(5:12)],
cova=weight_behavior[,2], y=weight_behavior[,1],
refy = 0, predref = "F",nskip=10,ndpost=20)
summary(try1)
#>
#> For Predictor1
#> Estimated Relative Effects for Method 3:
Another sets of covariates can be used for mediators, which is set by the arguments mcov and mclist. The mcov is teh covariance data for mediators. If mclist is null but not mcov, mcov is applied to all mediators. If both mcov and mclist are not NULL, the first item of mclist lists all mediators that are using different mcov, the following items gives the mcov for the mediators in order, NA if no mcov to be used. In the following example, there are 8 IVs, weight_behavior[,c(5:12)]. By setting mclist=c(list(var=1:7),rep(NA,6),list(1)), IVs 1 to 6 have no covariates, IV 7 has only one covariates, weight_behavior[,13] and IV 8 has all covariates in mcov.
try2= bma.bart(pred=weight_behavior[,3], m=weight_behavior[,c(5:12)],
cova=weight_behavior[,2], mcov=weight_behavior[,13:14],
mclist=c(list(var=1:7),rep(NA,6),list(1)),
y=weight_behavior[,1], refy = 0, predref = "F",nskip=0,ndpost=2)
summary(try2)
#>
#> For Predictor1
#> Estimated Relative Effects for Method 3:
#> DE numpeople car gotosch snack tvhours cmpthours cellhours sports
#> mean 0.9218 0.0062 0 0.0212 -0.0527 0 0.0067 0.0240 0.0607
#> sd 0.0026 0.0087 0 0.0152 0.0588 0 0.0095 0.0340 0.0022
#> q_0.025 0.9201 0.0003 0 0.0110 -0.0922 0 0.0003 0.0012 0.0593
#> q_0.25 0.9209 0.0031 0 0.0158 -0.0735 0 0.0034 0.0120 0.0600
#> q_0.5 0.9218 0.0062 0 0.0212 -0.0527 0 0.0067 0.0240 0.0607
#> q_0.75 0.9227 0.0092 0 0.0265 -0.0319 0 0.0101 0.0360 0.0615
#> q_0.975 0.9235 0.0120 0 0.0313 -0.0131 0 0.0131 0.0468 0.0622
Multiple predictors can be included. The mediation effects will be shown for each pair of the predictor-outcome relationship. Note that a k-category predictor is decomposed into k-1 binarized dummy predictors. The same is done for categorical outcomes. Besides that, multiple outcomes are not allowed currently. User has to perform a separate analysis for each outcome.
try3= bma.bart(pred=weight_behavior[,c(1,4)], m=weight_behavior[,c(2:3,5:14)],
y=weight_behavior[,15], refy = 0, predref = "OTHER",nskip=0,ndpost=2)
summary(try3)
#>
#> For Predictor1
#> Estimated Relative Effects for Method 3:
#> DE age sex numpeople car gotosch snack tvhours cmpthours
#> mean 1.2909 0.0431 -0.1399 -0.1607 0.0091 -0.4040 0.3138 -0.0530 -0.0379
#> sd 0.6723 0.0538 0.1392 0.0960 0.0023 0.6397 0.1967 0.0258 0.0021
#> q_0.025 0.8393 0.0070 -0.2335 -0.2252 0.0075 -0.8337 0.1817 -0.0704 -0.0393
#> q_0.25 1.0532 0.0241 -0.1892 -0.1946 0.0082 -0.6301 0.2443 -0.0622 -0.0386
#> q_0.5 1.2909 0.0431 -0.1399 -0.1607 0.0091 -0.4040 0.3138 -0.0530 -0.0379
#> q_0.75 1.5286 0.0621 -0.0907 -0.1267 0.0099 -0.1778 0.3834 -0.0439 -0.0371
#> q_0.975 1.7426 0.0793 -0.0464 -0.0962 0.0106 0.0257 0.4459 -0.0357 -0.0365
#> cellhours sports exercises sweat
#> mean 0.0550 0.0102 -0.1717 0.0515
#> sd 0.0180 0.0020 0.0996 0.0953
#> q_0.025 0.0429 0.0088 -0.2386 -0.0126
#> q_0.25 0.0486 0.0095 -0.2069 0.0178
#> q_0.5 0.0550 0.0102 -0.1717 0.0515
#> q_0.75 0.0614 0.0109 -0.1364 0.0852
#> q_0.975 0.0671 0.0115 -0.1047 0.1155
#>
#> For Predictor2
#>
#> Estimated Relative Effects for Method 3:
#> DE age sex numpeople car gotosch snack tvhours
#> mean 0.8709 -0.1075 -0.0033 -0.0086 0.0187 -0.0013 0.0031 -0.0380
#> sd 0.1315 0.1044 0.0022 0.0388 0.0065 0.0065 0.0021 0.0965
#> q_0.025 0.7826 -0.1776 -0.0048 -0.0346 0.0144 -0.0057 0.0018 -0.1028
#> q_0.25 0.8244 -0.1444 -0.0041 -0.0223 0.0164 -0.0036 0.0024 -0.0721
#> q_0.5 0.8709 -0.1075 -0.0033 -0.0086 0.0187 -0.0013 0.0031 -0.0380
#> q_0.75 0.9174 -0.0706 -0.0026 0.0051 0.0210 0.0010 0.0039 -0.0038
#> q_0.975 0.9593 -0.0373 -0.0019 0.0174 0.0231 0.0031 0.0045 0.0269
#> cmpthours cellhours sports exercises sweat
#> mean 0.1439 0.0042 0.0286 0.0826 0.0479
#> sd 0.0984 0.0706 0.0182 0.1032 0.0430
#> q_0.025 0.0778 -0.0432 0.0164 0.0133 0.0190
#> q_0.25 0.1091 -0.0208 0.0222 0.0461 0.0327
#> q_0.5 0.1439 0.0042 0.0286 0.0826 0.0479
#> q_0.75 0.1786 0.0291 0.0351 0.1190 0.0631
#> q_0.975 0.2099 0.0516 0.0409 0.1519 0.0768
#>
#> For Predictor3
#>
#> Estimated Relative Effects for Method 3:
#> DE age sex numpeople car gotosch snack tvhours cmpthours
#> mean 0 -1.7521 -5.4089 1.1263 -0.1604 -5.8365 9.2001 1.7455 -0.5374
#> sd 0 2.3395 5.9913 1.0226 0.0655 7.8263 9.3334 2.8184 0.2583
#> q_0.025 0 -3.3237 -9.4336 0.4393 -0.2044 -11.0938 2.9304 -0.1478 -0.7109
#> q_0.25 0 -2.5793 -7.5272 0.7647 -0.1836 -8.6035 5.9003 0.7490 -0.6287
#> q_0.5 0 -1.7521 -5.4089 1.1263 -0.1604 -5.8365 9.2001 1.7455 -0.5374
#> q_0.75 0 -0.9250 -3.2907 1.4878 -0.1372 -3.0695 12.5000 2.7420 -0.4461
#> q_0.975 0 -0.1806 -1.3843 1.8132 -0.1164 -0.5791 15.4699 3.6388 -0.3639
#> cellhours sports exercises sweat
#> mean 3.0496 -0.3895 2.5896 -2.4536
#> sd 4.4677 0.5913 3.2288 3.5189
#> q_0.025 0.0484 -0.7867 0.4206 -4.8175
#> q_0.25 1.4700 -0.5985 1.4480 -3.6977
#> q_0.5 3.0496 -0.3895 2.5896 -2.4536
#> q_0.75 4.6292 -0.1804 3.7312 -1.2095
#> q_0.975 6.0508 0.0077 4.7586 -0.0898
#>
#> For Predictor4
#>
#> Estimated Relative Effects for Method 3:
#> DE age sex numpeople car gotosch snack tvhours cmpthours
#> mean 0.9464 0.3258 -0.3047 -0.0979 0.0138 0.2523 0.1018 0.1770 0.1446
#> sd 0.0434 0.2279 0.0132 0.3003 0.0043 0.0278 0.0302 0.2773 0.0933
#> q_0.025 0.9173 0.1727 -0.3136 -0.2996 0.0109 0.2336 0.0816 -0.0092 0.0820
#> q_0.25 0.9311 0.2452 -0.3094 -0.2041 0.0123 0.2425 0.0912 0.0790 0.1116
#> q_0.5 0.9464 0.3258 -0.3047 -0.0979 0.0138 0.2523 0.1018 0.1770 0.1446
#> q_0.75 0.9617 0.4063 -0.3000 0.0083 0.0153 0.2621 0.1125 0.2751 0.1776
#> q_0.975 0.9755 0.4788 -0.2958 0.1038 0.0167 0.2709 0.1221 0.3633 0.2073
#> cellhours sports exercises sweat
#> mean 0.0074 -0.0174 -0.5653 -0.0532
#> sd 0.0584 0.0104 0.8716 0.0944
#> q_0.025 -0.0319 -0.0244 -1.1508 -0.1166
#> q_0.25 -0.0133 -0.0211 -0.8734 -0.0866
#> q_0.5 0.0074 -0.0174 -0.5653 -0.0532
#> q_0.75 0.0280 -0.0138 -0.2571 -0.0198
#> q_0.975 0.0466 -0.0105 0.0202 0.0103
#>
#> For Predictor5
#>
#> Estimated Relative Effects for Method 3:
#> DE age sex numpeople car gotosch snack tvhours cmpthours
#> mean 0.4153 0.2738 -0.0985 -0.0358 0.0245 -0.0442 0.0843 -0.0321 0.8140
#> sd 0.1136 0.0241 0.0110 0.1018 0.0033 0.0277 0.0287 0.0872 0.6056
#> q_0.025 0.3390 0.2576 -0.1060 -0.1042 0.0223 -0.0628 0.0650 -0.0906 0.4072
#> q_0.25 0.3752 0.2653 -0.1024 -0.0718 0.0234 -0.0540 0.0741 -0.0629 0.5999
#> q_0.5 0.4153 0.2738 -0.0985 -0.0358 0.0245 -0.0442 0.0843 -0.0321 0.8140
#> q_0.75 0.4555 0.2823 -0.0946 0.0003 0.0257 -0.0344 0.0944 -0.0012 1.0281
#> q_0.975 0.4916 0.2899 -0.0911 0.0327 0.0268 -0.0256 0.1035 0.0265 1.2208
#> cellhours sports exercises sweat
#> mean 0.0110 -0.1325 -0.3038 -0.0243
#> sd 0.0604 0.0728 0.6221 0.0514
#> q_0.025 -0.0296 -0.1814 -0.7217 -0.0588
#> q_0.25 -0.0104 -0.1582 -0.5237 -0.0425
#> q_0.5 0.0110 -0.1325 -0.3038 -0.0243
#> q_0.75 0.0323 -0.1067 -0.0838 -0.0061
#> q_0.975 0.0515 -0.0835 0.1141 0.0102
#>
#> For Predictor6
#>
#> Estimated Relative Effects for Method 3:
#> DE age sex numpeople car gotosch snack tvhours cmpthours
#> mean 0.6726 0.0039 0.0505 -0.0538 0.0185 -0.0284 0.0772 -0.0366 0.3370
#> sd 0.1226 0.0016 0.0164 0.0411 0.0054 0.0382 0.0453 0.0908 0.2842
#> q_0.025 0.5902 0.0029 0.0395 -0.0814 0.0149 -0.0541 0.0468 -0.0976 0.1461
#> q_0.25 0.6292 0.0034 0.0448 -0.0683 0.0166 -0.0420 0.0612 -0.0687 0.2365
#> q_0.5 0.6726 0.0039 0.0505 -0.0538 0.0185 -0.0284 0.0772 -0.0366 0.3370
#> q_0.75 0.7159 0.0045 0.0563 -0.0392 0.0205 -0.0149 0.0932 -0.0045 0.4375
#> q_0.975 0.7549 0.0050 0.0615 -0.0262 0.0222 -0.0028 0.1076 0.0244 0.5279
#> cellhours sports exercises sweat
#> mean -0.0020 0.0234 -0.1198 0.0147
#> sd 0.0548 0.0167 0.2649 0.0013
#> q_0.025 -0.0388 0.0122 -0.2978 0.0138
#> q_0.25 -0.0214 0.0175 -0.2135 0.0142
#> q_0.5 -0.0020 0.0234 -0.1198 0.0147
#> q_0.75 0.0174 0.0293 -0.0261 0.0151
#> q_0.975 0.0348 0.0346 0.0582 0.0155
The bma.bart function returns a bma.bart object. There are many other information can be used. Using try0 as an example, the following codes showed some use of the results.
###Show DIC and related paramters
try0$DIC$D_bar
#> [1] 23710.62
try0$DIC$var_D
#> [1] 2931.616
try0$DIC$p_D #calculated using two methods
#> [1] 1465.80816 97.44002
try0$DIC$DIC
#> [1] 23905.5
Besides that the deviance for the outcome and each mediator is saved in deviances.y and deviances.m separately.
###The fitted models, and a-part and b-part in method 2 The fitted BART for the outcome is saved as y.model, and that for each IV is saved in m.models.
For method 2, there are two outputs: a-part and b-part. a-part shows the changing rate of the corresponding IV with the predictor, and b-part the changing rate of the outcome with each IV.
The following simulation shows a way to use these outputs.
#generating the data
set.seed(123)
n <- 1000
X <- rnorm(n)
M <- 0.5 + 0.5 * X^2 + rnorm(n)
Y <- 0.6 * M^3 + 0.5* X + rnorm(n)
data <- data.frame(X = X, M = M, Y = Y)
try4= bma.bart(pred=X, m=M, y=Y, ndpost=10L, nskip=100L) #use the sd/10 for X and M
summary(try4)
#>
#> For Predictor1
#> Estimated Relative Effects for Method 3:
summary(try4)
#>
#> For Predictor1
#> Estimated Relative Effects for Method 3:
#> DE m1
#> mean 0.1685 0.8787
#> sd 0.0521 0.0565
#> q_0.025 0.0983 0.8018
#> q_0.25 0.1331 0.8353
#> q_0.5 0.1569 0.8773
#> q_0.75 0.2160 0.9231
#> q_0.975 0.2418 0.9576
#Associations between X and M
plot(X, M, main = "BART Model: X vs M", ylab = "M", xlab = "", pch = 16, col = "blue")
points(X, try4$m.models[[1]]$yhat.train.mean, col = "red", pch = 16)
ie2.apart=try4$apart.ie[,,1]
plot(X,apply(ie2.apart,2,mean), main = "X vs apart", ylab = "apart",xlab = "", pch = 16, col = "red")
#Association between M and Y
plot(M, Y, main = "BART Model: M vs Y", ylab = "Y", xlab = "", pch = 16, col = "blue") #
points(M, try4$y.model$yhat.train.mean, col = "red", pch = 16)
legend("topleft", legend = c("Observed", "BART Predictions"), col = c("blue", "red"),
pch = c(16,16), bty="n")
ie2.bpart=try4$bpart.ie[,,1]
plot(M, apply(ie2.bpart,2,mean), main = "M vs bpart", ylab = "bpart", xlab = "", col = "red", pch = 16)