This vignette introduces the Bivariate Binomial Conditionals
Distribution (BBCD), defined via conditional specifications, as proposed
by Ghosh, Marques, and Chakraborty (2025). The BCD
package
provides functions to evaluate the joint and cumulative distributions,
perform random sampling, and estimate parameters via maximum
likelihood.
dbinomBCD()
The joint probability mass function (p.m.f.) of the BBCD is given by:
\[ P(X = x, Y = y) = K \cdot \binom{n_1}{x} \binom{n_2}{y} p_1^x (1 - p_1)^{n_1 - x} p_2^y (1 - p_2)^{n_2 - y} \lambda^{xy}, \]
where \(K\) is a normalizing constant ensuring the probabilities sum to 1.
pbinomBCD()
The function pbinomBCD()
computes the cumulative
distribution:
\[ P(X \leq x, Y \leq y) \]
rbinomBCD()
Generate samples from the BBCD using:
set.seed(123)
samples <- rbinomBCD(n = 100, n1 = 10, n2 = 10, p1 = 0.5, p2 = 0.4, lambda = 1.2)
#> Starting to generate 100 samples with parameters:
#> n1 = 10, n2 = 10, p1 = 0.5000, p2 = 0.4000, lambda = 1.2000
#>
#> WARNING: For lambda > 1, sampling may take significantly longer due
#> to the nature of the distribution.
#> Please be patient as the simulation runs...
#>
#> 10/100 samples (10.0%) | Attempts: 10 | Time: 0.0 sec20/100 samples (20.0%) | Attempts: 20 | Time: 0.0 sec30/100 samples (30.0%) | Attempts: 31 | Time: 0.0 sec40/100 samples (40.0%) | Attempts: 42 | Time: 0.0 sec50/100 samples (50.0%) | Attempts: 52 | Time: 0.0 sec60/100 samples (60.0%) | Attempts: 62 | Time: 0.0 sec70/100 samples (70.0%) | Attempts: 72 | Time: 0.0 sec80/100 samples (80.0%) | Attempts: 82 | Time: 0.0 sec90/100 samples (90.0%) | Attempts: 93 | Time: 0.0 sec100/100 samples (100.0%) | Attempts: 107 | Time: 0.0 sec
#>
#> Sampling completed:
#> Total samples: 100
#> Total attempts: 107
#> Total time: 0.04 seconds
#> Sample correlation: -0.0369 (negative)
#> Note: Sample correlation sign (negative) differs from theoretical expectation (positive) for lambda = 1.20
#> This might be due to small sample size. Try increasing n.
#> Alternatively, verify that dbinomBCD and normalize_constant_BBCD are correctly implemented.
head(samples)
#> X Y
#> 1 4 5
#> 2 7 6
#> 3 5 6
#> 4 5 7
#> 5 6 4
#> 6 7 3
MLEbinomBCD()
Estimate the parameters of the distribution from data.
data <- rbinomBCD(n = 100, n1 = 6, n2 = 4, p1 = 0.6, p2 = 0.3, lambda = 1.5)
#> Starting to generate 100 samples with parameters:
#> n1 = 6, n2 = 4, p1 = 0.6000, p2 = 0.3000, lambda = 1.5000
#>
#> WARNING: For lambda > 1, sampling may take significantly longer due
#> to the nature of the distribution.
#> Please be patient as the simulation runs...
#>
#> 10/100 samples (10.0%) | Attempts: 15 | Time: 0.0 sec20/100 samples (20.0%) | Attempts: 30 | Time: 0.0 sec30/100 samples (30.0%) | Attempts: 45 | Time: 0.0 sec40/100 samples (40.0%) | Attempts: 60 | Time: 0.0 sec50/100 samples (50.0%) | Attempts: 76 | Time: 0.0 sec60/100 samples (60.0%) | Attempts: 91 | Time: 0.0 sec70/100 samples (70.0%) | Attempts: 107 | Time: 0.0 sec80/100 samples (80.0%) | Attempts: 126 | Time: 0.0 sec90/100 samples (90.0%) | Attempts: 142 | Time: 0.0 sec100/100 samples (100.0%) | Attempts: 157 | Time: 0.0 sec
#>
#> Sampling completed:
#> Total samples: 100
#> Total attempts: 157
#> Total time: 0.01 seconds
#> Sample correlation: -0.0501 (negative)
#> Note: Sample correlation sign (negative) differs from theoretical expectation (positive) for lambda = 1.50
#> This might be due to small sample size. Try increasing n.
#> Alternatively, verify that dbinomBCD and normalize_constant_BBCD are correctly implemented.
fit <- MLEbinomBCD(data)
#> Starting MLE estimation for BBCD...
#> Estimating n1 (grid search from6to11)
#> Estimating n2 (grid search from4to9)
#>
#> Improved fit: n1=6, n2=4, p1=0.6005, p2=0.4131, lambda=0.9732, logLik=-271.74
#> Trying combination 2/36: n1=6, n2=5
#> Trying combination 4/36: n1=6, n2=7
#> Trying combination 6/36: n1=6, n2=9
#> Trying combination 8/36: n1=7, n2=5
#> Trying combination 10/36: n1=7, n2=7
#> Trying combination 12/36: n1=7, n2=9
#> Trying combination 14/36: n1=8, n2=5
#> Trying combination 16/36: n1=8, n2=7
#> Trying combination 18/36: n1=8, n2=9
#> Trying combination 20/36: n1=9, n2=5
#> Trying combination 22/36: n1=9, n2=7
#> Trying combination 24/36: n1=9, n2=9
#> Trying combination 26/36: n1=10, n2=5
#> Trying combination 28/36: n1=10, n2=7
#> Trying combination 30/36: n1=10, n2=9
#> Trying combination 32/36: n1=11, n2=5
#> Trying combination 34/36: n1=11, n2=7
#> Trying combination 36/36: n1=11, n2=9
#>
#> Final parameter estimates:
#> n1 = 6
#> n2 = 4
#> p1 = 0.6005
#> p2 = 0.4131
#> lambda = 0.9732
#> Log-likelihood: -271.7432
#> AIC: 553.4864
#> BIC: 566.5123
fit
#> $n1
#> [1] 6
#>
#> $n2
#> [1] 4
#>
#> $p1
#> [1] 0.6005103
#>
#> $p2
#> [1] 0.4130676
#>
#> $lambda
#> [1] 0.9732248
#>
#> $logLik
#> [1] -271.7432
#>
#> $AIC
#> [1] 553.4864
#>
#> $BIC
#> [1] 566.5123
#>
#> $n_params
#> [1] 5
#>
#> $n_obs
#> [1] 100
You may also fix known values for n1
and
n2
:
MLEbinomBCD(data, fixed_n1 = 6, fixed_n2 = 4)
#> Starting MLE estimation for BBCD...
#>
#> Improved fit: n1=6, n2=4, p1=0.6005, p2=0.4131, lambda=0.9732, logLik=-271.74
#>
#> Final parameter estimates:
#> n1 = 6
#> n2 = 4
#> p1 = 0.6005
#> p2 = 0.4131
#> lambda = 0.9732
#> Log-likelihood: -271.7432
#> AIC: 549.4864
#> BIC: 557.3019
#> $n1
#> [1] 6
#>
#> $n2
#> [1] 4
#>
#> $p1
#> [1] 0.6005103
#>
#> $p2
#> [1] 0.4130676
#>
#> $lambda
#> [1] 0.9732248
#>
#> $logLik
#> [1] -271.7432
#>
#> $AIC
#> [1] 549.4864
#>
#> $BIC
#> [1] 557.3019
#>
#> $n_params
#> [1] 3
#>
#> $n_obs
#> [1] 100
The dataset shacc
is related to accident records for 122
experienced railway shunters across two historical periods.
data(shacc)
head(shacc)
#> X Y
#> 1 0 0
#> 2 0 0
#> 3 0 0
#> 4 0 0
#> 5 0 0
#> 6 0 0
plot(shacc$X, shacc$Y, xlab = "Accidents 1937–42", ylab = "Accidents 1943–47")
fit <- MLEbinomBCD(shacc, fixed_n1 = 33, fixed_n2 = 27)
#> Starting MLE estimation for BBCD...
#>
#> Improved fit: n1=33, n2=27, p1=0.0305, p2=0.0264, lambda=1.2492, logLik=-345.47
#>
#> Final parameter estimates:
#> n1 = 33
#> n2 = 27
#> p1 = 0.0305
#> p2 = 0.0264
#> lambda = 1.2492
#> Log-likelihood: -345.4697
#> AIC: 696.9393
#> BIC: 705.3514
FTtest(shacc, "BBCD", params = fit, num_params = 3)
#> $observed
#> 0 1 2 3 4 5 6 7
#> 0 21 13 4 2 0 0 0 0
#> 1 18 14 5 1 0 0 0 1
#> 2 8 10 4 3 1 0 0 0
#> 3 2 1 2 2 1 0 0 0
#> 4 1 4 1 0 0 0 0 0
#> 5 0 1 0 1 0 0 0 0
#> 6 0 0 1 0 0 0 0 0
#>
#> $expected
#> 0 1 2 3 4 5
#> 0 16.93244413 12.41605364 4.38356144 0.99207845 0.16165809 0.02019554
#> 1 17.56573745 16.09034862 7.09649839 2.00631059 0.40839904 0.06373498
#> 2 8.83525658 10.11005155 5.57015348 1.96723551 0.50023957 0.09752283
#> 3 2.87006942 4.10262425 2.82364875 1.24576178 0.39572349 0.09637297
#> 4 0.67668486 1.20834497 1.03890301 0.57257776 0.22720948 0.06912338
#> 5 0.12338071 0.27522436 0.29560112 0.20351699 0.10088527 0.03834083
#> 6 0.01810034 0.05043843 0.06767308 0.05820304 0.03604192 0.01711105
#> 6 7
#> 0 0.002011069 0.0001638508
#> 1 0.007928382 0.0008069398
#> 2 0.015154729 0.0019268144
#> 3 0.018708210 0.0029713860
#> 4 0.016762436 0.0033258227
#> 5 0.011614720 0.0028787626
#> 6 0.006475289 0.0020048939
#>
#> $test
#> $test$statistic
#> [1] 32.08716
#>
#> $test$df
#> [1] 52
#>
#> $test$p_value
#> [1] 0.9864757
The dataset seedplant
records the number of seeds sown
and the number of resulting plants grown over plots of fixed area (5
square feet).
data(seedplant)
head(seedplant)
#> X Y
#> 1 0 0
#> 2 0 0
#> 3 0 0
#> 4 0 0
#> 5 0 0
#> 6 0 0
plot(seedplant$X, seedplant$Y, xlab = "Seeds Sown", ylab = "Plants Grown")
EstParams <- MLEbinomBCD(shacc, fixed_n1 = 13, fixed_n2 = 11)
#> Starting MLE estimation for BBCD...
#>
#> Improved fit: n1=13, n2=11, p1=0.0729, p2=0.0602, lambda=1.3396, logLik=-347.75
#>
#> Final parameter estimates:
#> n1 = 13
#> n2 = 11
#> p1 = 0.0729
#> p2 = 0.0602
#> lambda = 1.3396
#> Log-likelihood: -347.7513
#> AIC: 701.5026
#> BIC: 709.9147
FTtest(shacc, "BBCD", params = EstParams, num_params = 3)
#> $observed
#> 0 1 2 3 4 5 6 7
#> 0 21 13 4 2 0 0 0 0
#> 1 18 14 5 1 0 0 0 1
#> 2 8 10 4 3 1 0 0 0
#> 3 2 1 2 2 1 0 0 0
#> 4 1 4 1 0 0 0 0 0
#> 5 0 1 0 1 0 0 0 0
#> 6 0 0 1 0 0 0 0 0
#>
#> $expected
#> 0 1 2 3 4 5
#> 0 17.284984159 12.1716229 3.89587767 0.74819256 0.09579221 0.008585094
#> 1 17.665738398 16.6639688 7.14500294 1.83813596 0.31525471 0.037848057
#> 2 8.333021495 10.5297072 6.04793911 2.08424983 0.47885153 0.077010514
#> 3 2.402112755 4.0660651 3.12847428 1.44424908 0.44448832 0.095758347
#> 4 0.472120496 1.0705352 1.10338315 0.68234341 0.28131209 0.081184260
#> 5 0.066810515 0.2029364 0.28018998 0.23211136 0.12818846 0.049556374
#> 6 0.007003305 0.0284961 0.05270418 0.05848655 0.04326889 0.022407502
#> 6 7
#> 0 0.0005495812 2.512989e-05
#> 1 0.0032456191 1.988033e-04
#> 2 0.0088464929 7.258790e-04
#> 3 0.0147355006 1.619665e-03
#> 4 0.0167350585 2.464079e-03
#> 5 0.0136842837 2.699085e-03
#> 6 0.0082886373 2.190003e-03
#>
#> $test
#> $test$statistic
#> [1] 33.07199
#>
#> $test$df
#> [1] 52
#>
#> $test$p_value
#> [1] 0.9811565
Reference: Ghosh, I., Marques, F., & Chakraborty, S. (2025). A form of bivariate binomial conditionals distributions. Communications in Statistics - Theory and Methods, 54(2), 534–553.