The Bivariate Binomial Conditionals Distribution (BBCD)

Introduction

This vignette introduces the Bivariate Binomial Conditionals Distribution (BBCD), defined via conditional specifications, as proposed by Ghosh, Marques, and Chakraborty (2025). The BCD package provides functions to evaluate the joint and cumulative distributions, perform random sampling, and estimate parameters via maximum likelihood.

Joint Probability: dbinomBCD()

The joint probability mass function (p.m.f.) of the BBCD is given by:

\[ P(X = x, Y = y) = K \cdot \binom{n_1}{x} \binom{n_2}{y} p_1^x (1 - p_1)^{n_1 - x} p_2^y (1 - p_2)^{n_2 - y} \lambda^{xy}, \]

where \(K\) is a normalizing constant ensuring the probabilities sum to 1.

Example

dbinomBCD(x = 2, y = 1, n1 = 5, n2 = 5, p1 = 0.5, p2 = 0.4, lambda = 0.5)
#> [1] 0.1072416
# independence case
dbinomBCD(x = 2, y = 1, n1 = 5, n2 = 5, p1 = 0.5, p2 = 0.4, lambda = 1.0) 
#> [1] 0.081

Cumulative Distribution: pbinomBCD()

The function pbinomBCD() computes the cumulative distribution:

\[ P(X \leq x, Y \leq y) \]

Example

pbinomBCD(x = 2, y = 5, n1 = 5, n2 = 5, p1 = 0.5, p2 = 0.4, lambda = 0.5)
#> [1] 0.7147949
pbinomBCD(x = 1, y = 1, n1 = 10, n2 = 10, p1 = 0.3, p2 = 0.6, lambda = 1)
#> [1] 0.0002504978

Random Sampling: rbinomBCD()

Generate samples from the BBCD using:

rbinomBCD(n, n1, n2, p1, p2, lambda)

Example

set.seed(123)
samples <- rbinomBCD(n = 100, n1 = 10, n2 = 10, p1 = 0.5, p2 = 0.4, lambda = 1.2)
#> Starting to generate 100 samples with parameters:
#> n1 = 10, n2 = 10, p1 = 0.5000, p2 = 0.4000, lambda = 1.2000
#> 
#> WARNING: For lambda > 1, sampling may take significantly longer due
#>           to the nature of the distribution.
#> Please be patient as the simulation runs...
#> 
#> 10/100 samples (10.0%) | Attempts: 10 | Time: 0.0 sec20/100 samples (20.0%) | Attempts: 20 | Time: 0.0 sec30/100 samples (30.0%) | Attempts: 31 | Time: 0.0 sec40/100 samples (40.0%) | Attempts: 42 | Time: 0.0 sec50/100 samples (50.0%) | Attempts: 52 | Time: 0.0 sec60/100 samples (60.0%) | Attempts: 62 | Time: 0.0 sec70/100 samples (70.0%) | Attempts: 72 | Time: 0.0 sec80/100 samples (80.0%) | Attempts: 82 | Time: 0.0 sec90/100 samples (90.0%) | Attempts: 93 | Time: 0.0 sec100/100 samples (100.0%) | Attempts: 107 | Time: 0.0 sec
#> 
#> Sampling completed:
#> Total samples: 100
#> Total attempts: 107
#> Total time: 0.04 seconds
#> Sample correlation: -0.0369 (negative)
#> Note: Sample correlation sign (negative) differs from theoretical expectation (positive) for lambda = 1.20
#> This might be due to small sample size. Try increasing n.
#> Alternatively, verify that dbinomBCD and normalize_constant_BBCD are correctly implemented.
head(samples)
#>   X Y
#> 1 4 5
#> 2 7 6
#> 3 5 6
#> 4 5 7
#> 5 6 4
#> 6 7 3

Maximum Likelihood Estimation: MLEbinomBCD()

Estimate the parameters of the distribution from data.

Example

data <- rbinomBCD(n = 100, n1 = 6, n2 = 4, p1 = 0.6, p2 = 0.3, lambda = 1.5)
#> Starting to generate 100 samples with parameters:
#> n1 = 6, n2 = 4, p1 = 0.6000, p2 = 0.3000, lambda = 1.5000
#> 
#> WARNING: For lambda > 1, sampling may take significantly longer due
#>           to the nature of the distribution.
#> Please be patient as the simulation runs...
#> 
#> 10/100 samples (10.0%) | Attempts: 15 | Time: 0.0 sec20/100 samples (20.0%) | Attempts: 30 | Time: 0.0 sec30/100 samples (30.0%) | Attempts: 45 | Time: 0.0 sec40/100 samples (40.0%) | Attempts: 60 | Time: 0.0 sec50/100 samples (50.0%) | Attempts: 76 | Time: 0.0 sec60/100 samples (60.0%) | Attempts: 91 | Time: 0.0 sec70/100 samples (70.0%) | Attempts: 107 | Time: 0.0 sec80/100 samples (80.0%) | Attempts: 126 | Time: 0.0 sec90/100 samples (90.0%) | Attempts: 142 | Time: 0.0 sec100/100 samples (100.0%) | Attempts: 157 | Time: 0.0 sec
#> 
#> Sampling completed:
#> Total samples: 100
#> Total attempts: 157
#> Total time: 0.01 seconds
#> Sample correlation: -0.0501 (negative)
#> Note: Sample correlation sign (negative) differs from theoretical expectation (positive) for lambda = 1.50
#> This might be due to small sample size. Try increasing n.
#> Alternatively, verify that dbinomBCD and normalize_constant_BBCD are correctly implemented.
fit <- MLEbinomBCD(data)
#> Starting MLE estimation for BBCD...
#> Estimating n1 (grid search from6to11)
#> Estimating n2 (grid search from4to9)
#> 
#> Improved fit: n1=6, n2=4, p1=0.6005, p2=0.4131, lambda=0.9732, logLik=-271.74
#>  Trying combination 2/36: n1=6, n2=5
#>  Trying combination 4/36: n1=6, n2=7
#>  Trying combination 6/36: n1=6, n2=9
#>  Trying combination 8/36: n1=7, n2=5
#>  Trying combination 10/36: n1=7, n2=7
#>  Trying combination 12/36: n1=7, n2=9
#>  Trying combination 14/36: n1=8, n2=5
#>  Trying combination 16/36: n1=8, n2=7
#>  Trying combination 18/36: n1=8, n2=9
#>  Trying combination 20/36: n1=9, n2=5
#>  Trying combination 22/36: n1=9, n2=7
#>  Trying combination 24/36: n1=9, n2=9
#>  Trying combination 26/36: n1=10, n2=5
#>  Trying combination 28/36: n1=10, n2=7
#>  Trying combination 30/36: n1=10, n2=9
#>  Trying combination 32/36: n1=11, n2=5
#>  Trying combination 34/36: n1=11, n2=7
#>  Trying combination 36/36: n1=11, n2=9
#> 
#> Final parameter estimates:
#> n1 = 6
#> n2 = 4
#> p1 = 0.6005
#> p2 = 0.4131
#> lambda = 0.9732
#> Log-likelihood: -271.7432
#> AIC: 553.4864
#> BIC: 566.5123
fit
#> $n1
#> [1] 6
#> 
#> $n2
#> [1] 4
#> 
#> $p1
#> [1] 0.6005103
#> 
#> $p2
#> [1] 0.4130676
#> 
#> $lambda
#> [1] 0.9732248
#> 
#> $logLik
#> [1] -271.7432
#> 
#> $AIC
#> [1] 553.4864
#> 
#> $BIC
#> [1] 566.5123
#> 
#> $n_params
#> [1] 5
#> 
#> $n_obs
#> [1] 100

You may also fix known values for n1 and n2:

MLEbinomBCD(data, fixed_n1 = 6, fixed_n2 = 4)
#> Starting MLE estimation for BBCD...
#> 
#> Improved fit: n1=6, n2=4, p1=0.6005, p2=0.4131, lambda=0.9732, logLik=-271.74
#> 
#> Final parameter estimates:
#> n1 = 6
#> n2 = 4
#> p1 = 0.6005
#> p2 = 0.4131
#> lambda = 0.9732
#> Log-likelihood: -271.7432
#> AIC: 549.4864
#> BIC: 557.3019
#> $n1
#> [1] 6
#> 
#> $n2
#> [1] 4
#> 
#> $p1
#> [1] 0.6005103
#> 
#> $p2
#> [1] 0.4130676
#> 
#> $lambda
#> [1] 0.9732248
#> 
#> $logLik
#> [1] -271.7432
#> 
#> $AIC
#> [1] 549.4864
#> 
#> $BIC
#> [1] 557.3019
#> 
#> $n_params
#> [1] 3
#> 
#> $n_obs
#> [1] 100

Real Data Example

The dataset shacc is related to accident records for 122 experienced railway shunters across two historical periods.

data(shacc)
head(shacc)
#>   X Y
#> 1 0 0
#> 2 0 0
#> 3 0 0
#> 4 0 0
#> 5 0 0
#> 6 0 0
plot(shacc$X, shacc$Y, xlab = "Accidents 1937–42", ylab = "Accidents 1943–47")

fit <- MLEbinomBCD(shacc, fixed_n1 = 33, fixed_n2 = 27)
#> Starting MLE estimation for BBCD...
#> 
#> Improved fit: n1=33, n2=27, p1=0.0305, p2=0.0264, lambda=1.2492, logLik=-345.47
#> 
#> Final parameter estimates:
#> n1 = 33
#> n2 = 27
#> p1 = 0.0305
#> p2 = 0.0264
#> lambda = 1.2492
#> Log-likelihood: -345.4697
#> AIC: 696.9393
#> BIC: 705.3514
FTtest(shacc, "BBCD", params = fit, num_params = 3)
#> $observed
#>    0  1 2 3 4 5 6 7
#> 0 21 13 4 2 0 0 0 0
#> 1 18 14 5 1 0 0 0 1
#> 2  8 10 4 3 1 0 0 0
#> 3  2  1 2 2 1 0 0 0
#> 4  1  4 1 0 0 0 0 0
#> 5  0  1 0 1 0 0 0 0
#> 6  0  0 1 0 0 0 0 0
#> 
#> $expected
#>             0           1          2          3          4          5
#> 0 16.93244413 12.41605364 4.38356144 0.99207845 0.16165809 0.02019554
#> 1 17.56573745 16.09034862 7.09649839 2.00631059 0.40839904 0.06373498
#> 2  8.83525658 10.11005155 5.57015348 1.96723551 0.50023957 0.09752283
#> 3  2.87006942  4.10262425 2.82364875 1.24576178 0.39572349 0.09637297
#> 4  0.67668486  1.20834497 1.03890301 0.57257776 0.22720948 0.06912338
#> 5  0.12338071  0.27522436 0.29560112 0.20351699 0.10088527 0.03834083
#> 6  0.01810034  0.05043843 0.06767308 0.05820304 0.03604192 0.01711105
#>             6            7
#> 0 0.002011069 0.0001638508
#> 1 0.007928382 0.0008069398
#> 2 0.015154729 0.0019268144
#> 3 0.018708210 0.0029713860
#> 4 0.016762436 0.0033258227
#> 5 0.011614720 0.0028787626
#> 6 0.006475289 0.0020048939
#> 
#> $test
#> $test$statistic
#> [1] 32.08716
#> 
#> $test$df
#> [1] 52
#> 
#> $test$p_value
#> [1] 0.9864757

The dataset seedplant records the number of seeds sown and the number of resulting plants grown over plots of fixed area (5 square feet).

data(seedplant)
head(seedplant)
#>   X Y
#> 1 0 0
#> 2 0 0
#> 3 0 0
#> 4 0 0
#> 5 0 0
#> 6 0 0
plot(seedplant$X, seedplant$Y, xlab = "Seeds Sown", ylab = "Plants Grown")

EstParams <- MLEbinomBCD(shacc, fixed_n1 = 13, fixed_n2 = 11)
#> Starting MLE estimation for BBCD...
#> 
#> Improved fit: n1=13, n2=11, p1=0.0729, p2=0.0602, lambda=1.3396, logLik=-347.75
#> 
#> Final parameter estimates:
#> n1 = 13
#> n2 = 11
#> p1 = 0.0729
#> p2 = 0.0602
#> lambda = 1.3396
#> Log-likelihood: -347.7513
#> AIC: 701.5026
#> BIC: 709.9147
FTtest(shacc, "BBCD", params = EstParams, num_params = 3)
#> $observed
#>    0  1 2 3 4 5 6 7
#> 0 21 13 4 2 0 0 0 0
#> 1 18 14 5 1 0 0 0 1
#> 2  8 10 4 3 1 0 0 0
#> 3  2  1 2 2 1 0 0 0
#> 4  1  4 1 0 0 0 0 0
#> 5  0  1 0 1 0 0 0 0
#> 6  0  0 1 0 0 0 0 0
#> 
#> $expected
#>              0          1          2          3          4           5
#> 0 17.284984159 12.1716229 3.89587767 0.74819256 0.09579221 0.008585094
#> 1 17.665738398 16.6639688 7.14500294 1.83813596 0.31525471 0.037848057
#> 2  8.333021495 10.5297072 6.04793911 2.08424983 0.47885153 0.077010514
#> 3  2.402112755  4.0660651 3.12847428 1.44424908 0.44448832 0.095758347
#> 4  0.472120496  1.0705352 1.10338315 0.68234341 0.28131209 0.081184260
#> 5  0.066810515  0.2029364 0.28018998 0.23211136 0.12818846 0.049556374
#> 6  0.007003305  0.0284961 0.05270418 0.05848655 0.04326889 0.022407502
#>              6            7
#> 0 0.0005495812 2.512989e-05
#> 1 0.0032456191 1.988033e-04
#> 2 0.0088464929 7.258790e-04
#> 3 0.0147355006 1.619665e-03
#> 4 0.0167350585 2.464079e-03
#> 5 0.0136842837 2.699085e-03
#> 6 0.0082886373 2.190003e-03
#> 
#> $test
#> $test$statistic
#> [1] 33.07199
#> 
#> $test$df
#> [1] 52
#> 
#> $test$p_value
#> [1] 0.9811565

Reference: Ghosh, I., Marques, F., & Chakraborty, S. (2025). A form of bivariate binomial conditionals distributions. Communications in Statistics - Theory and Methods, 54(2), 534–553.