| Type: | Package |
| Title: | Staged Event Trees |
| Version: | 2.3.0 |
| Description: | Creates and fits staged event tree probability models, which are probabilistic graphical models capable of representing asymmetric conditional independence statements for categorical variables. Includes functions to create, plot and fit staged event trees from data, as well as many efficient structure learning algorithms. References: Carli F, Leonelli M, Riccomagno E, Varando G (2022). <doi:10.18637/jss.v102.i06>. Collazo R. A., Görgen C. and Smith J. Q. (2018, ISBN:9781498729604). Görgen C., Bigatti A., Riccomagno E. and Smith J. Q. (2018) <doi:10.48550/arXiv.1705.09457>. Thwaites P. A., Smith, J. Q. (2017) <doi:10.48550/arXiv.1510.00186>. Barclay L. M., Hutton J. L. and Smith J. Q. (2013) <doi:10.1016/j.ijar.2013.05.006>. Smith J. Q. and Anderson P. E. (2008) <doi:10.1016/j.artint.2007.05.004>. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.3.1 |
| URL: | https://github.com/stagedtrees/stagedtrees |
| BugReports: | https://github.com/stagedtrees/stagedtrees/issues |
| Imports: | stats, graphics, cli, rlang, matrixStats |
| Suggests: | testthat (≥ 3.0.0), bnlearn, covr, clue, igraph |
| Config/testthat/edition: | 3 |
| Depends: | R (≥ 2.10) |
| NeedsCompilation: | no |
| Packaged: | 2024-02-14 10:59:25 UTC; gherardo |
| Author: | Gherardo Varando |
| Maintainer: | Gherardo Varando <gherardo.varando@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2024-02-14 11:50:02 UTC |
Staged event trees.
Description
Algorithms to create, learn, fit and explore staged event tree models. Functions to compute probabilities, make predictions from the fitted models and to plot, analyze and manipulate staged event trees.
Details
A staged event tree is a representation of a particular
factorization of a joint probability over a product space.
In particular, given a vector of categorical random variables
X1, X2, \ldots, a staged event tree represents the factorization
P(X1, X2, X3, \ldots) = P(X1)P(X2 | X1) P(X3 | X1, X2) \ldots .
Additionally, the stages structure indicates which conditional probabilities
are equal.
Model selection algorithms:
full model
fullindependence model
indepHill-Climbing
stages_hcBackward Hill-Climbing
stages_bhcFast Backward Hill-Climbing
stages_fbhcBackward Hill-Climbing Random
stages_bhcrBackward joining
stages_bjSimple Backward Hill-Climbing
stages_simplebhcHierarchical Clustering
stages_hclustK-Means Clustering
stages_kmeansOptimal order search
search_bestGreedy order search
search_greedy
Probabilities, log-likelihood and predictions:
Marginal/Conditional probabilities
probLog-Likelihood
logLik.sevtPredict method
predict.sevtConfidence intervals
confint.sevt
Plot, explore and compare:
Plot
plot.sevtCompare
compare_stagesStages inclusion
inclusions_stagesStages info
summary.sevtList of parents
as_parentslistBarplot construction
barplot.sevtLikelihood-ratio test
lr_testContext-specific interventional distance
cid
Modify models:
Join and isolate unobserved situations
join_unobservedJoin two stages
join_stagesJoin two positions
join_positionsRename a stage
rename_stage
Author(s)
Maintainer: Gherardo Varando gherardo.varando@gmail.com (ORCID)
Authors:
Federico Carli
Manuele Leonelli (ORCID)
Eva Riccomagno
References
Collazo R. A., Görgen C. and Smith J. Q. Chain event graphs. CRC Press, 2018.
Görgen C., Bigatti A., Riccomagno E. and Smith J. Q. Discovery of statistical equivalence classes using computer algebra. International Journal of Approximate Reasoning, vol. 95, pp. 167-184, 2018.
Barclay L. M., Hutton J. L. and Smith J. Q. Refining a Bayesian network using a chain event graph. International Journal of Approximate Reasoning, vol. 54, pp. 1300-1309, 2013.
Smith J. Q. and Anderson P. E. Conditional independence and chain event graphs. Artificial Intelligence, vol. 172, pp. 42-68, 2008.
Thwaites P. A., Smith, J. Q. A new method for tackling asymmetric decision problems. International Journal of Approximate Reasoning, vol. 88, pp. 624–639, 2017.
See Also
Useful links:
Report bugs at https://github.com/stagedtrees/stagedtrees/issues
Examples
data("PhDArticles")
mf <- full(PhDArticles, join_unobserved = TRUE)
mod <- stages_fbhc(mf)
plot(mod)
Asym dataset
Description
Artificial dataset with observations from four variables having a non-symmetrical conditional independence structure.
Usage
Asym
Format
A data frame with 1000 observations of 4 binary variables.
Source
The data has been generated by Federico Carli carli@dima.unige.
PhD Students Publications
Description
Number of publications of 915 PhD biochemistry students during the 1950’s and 1960’s.
Usage
PhDArticles
Format
A data frame with 915 rows and 6 variables:
- Articles
Number of articles during the last 3 years of PhD: either
0,1-2or>2.- Gender
maleorfemale.- Kids
yesif the student has at least one kid 5 or younger,nootherwise.- Married
yesorno.- Mentor
Number of publications of the student's mentor:
lowbetween 0 and 3,mediumbetween 4 and 10,highotherwise.- Prestige
lowif the student is at a low-prestige university,highotherwise.
Source
The data has been modified from the Rchoice package.
References
Long, J. S. (1990). The origins of sex differences in science. Social Forces, 68(4), 1297-1316.
Pokemon Go Users
Description
Demographic information of a population of possible Pokemon Go users.
Usage
Pokemon
Format
A data frame with 999 rows and 5 variables:
- Use
Yif the individual used the app,Notherwise- Age
>30if the individual is older than 30,<=30otherwise- Degree
Yesif the individual completed a Higher Education degree,Nootherwise- Gender
MaleorFemale- Activity
Yesif the individual was physically active (i.e. had a walk longer than 30 mins, went for a run or had a bike ride to get some exercise) in the past week before the experiment,Nootherwise
Source
References
Gabbiadini, Alessandro, Christina Sagioglou, and Tobias Greitemeyer. "Does Pokémon Go lead to a more physically active life style?." Computers in Human Behavior 84 (2018): 258-263.
Print a parentslist object
Description
Nice print of a parentslist object
Usage
## S3 method for class 'parentslist'
as.character(x, only_parents = FALSE, ...)
## S3 method for class 'parentslist'
print(x, ...)
Arguments
x |
an object of class |
only_parents |
logical, if the basic DAG encoding is to be returned. |
... |
additional arguments for compatibility. |
Value
as.character.parentslist returns a string
encoding the associated directed graph and eventually
the context specific independences.
The encoding is similar to the one returned by
modelstring in package bnlearn
and package deal.
In particular, parents of a variable can be enclosed in:
-
( )if a partial (conditional) independence is present. -
{ }if a context specific independence is present. -
< >if no context specific and partial (conditional) independences are present, but at least a local independence is detected.
If a parent is not enclosed in parenthesis the dependence is full.
If only_parents = TRUE, the simple DAG encoding as in bnlearn
is returned.
Examples
model <- stages_hclust(full(Titanic), k = 2)
pl <- as_parentslist(model)
pl
as.character(pl)
as.character(pl, only_parents = TRUE)
Convert to an adjacency matrix
Description
Convert to an adjacency matrix
Usage
as_adj_matrix(x, ...)
## S3 method for class 'parentslist'
as_adj_matrix(x, ...)
## S3 method for class 'ceg'
as_adj_matrix(x, ignore = x$name_unobserved, endnode = TRUE, ...)
Arguments
x |
an R object |
... |
additional parameters |
ignore |
list of stages to be ignored. |
endnode |
logical value. If |
Value
the equivalent adjacency matrix
for as_adj_matrix.ceg: the adj matrix corresponding to the CEG.
Convert to a bnlearn object
Description
Convert a staged tree object into an object of class bn
from the bnlearn package.
Usage
as_bn(x)
## S3 method for class 'parentslist'
as_bn(x)
## S3 method for class 'sevt'
as_bn(x)
Arguments
x |
an R object of class |
Value
an object of class bn from package bnlearn.
Obtain the equivalent DAG as list of parents
Description
Convert to the equivalent representation as list of parents.
Usage
as_parentslist(x, ...)
## S3 method for class 'bn'
as_parentslist(x, order = NULL, ...)
## S3 method for class 'bn.fit'
as_parentslist(x, order = NULL, ...)
## S3 method for class 'sevt'
as_parentslist(x, silent = FALSE, ...)
Arguments
x |
an R object. |
... |
additional parameters. |
order |
order of the variables, usually a topological order. |
silent |
if function should be silent. |
Details
The output of this function is an object of class
parentslist which is one of the possible encoding for
a directed graph. This is mainly an internal class and its
specification can be changed in the future.
For example, now it may also include information on the
sample space of the variables and the context/partial/local
independences.
In as_parentslist.sevt, if a context-specific or a local-partial independence is detected
a message is printed (if silent = FALSE) and the minimal super-model is returned.
Value
An object of class parentslist for which a
print method exists.
Basically a list with
one entries for each variable with fields:
-
parentsThe parents of the variable. -
contextWhere context independences are detected. -
partialWhere partial independences are detected. -
localWhere no context/partial independences are detected, but local independences are present. -
valuesvalues for the variable.
See Also
print.parentslist and
as.character.parentslist for the parenthesis-encoding of the
DAG structure and the asymmetric independences.
Examples
model <- stages_hclust(full(Titanic), k = 2)
pl <- as_parentslist(model)
pl$Age
Coerce to sevt
Description
Convert to an equivalent object of class sevt.
Usage
as_sevt(x, ...)
## S3 method for class 'bn.fit'
as_sevt(x, order = NULL, ...)
## S3 method for class 'bn'
as_sevt(x, order = NULL, values = NULL, ...)
## S3 method for class 'parentslist'
as_sevt(x, order = NULL, values = NULL, ...)
Arguments
x |
an R object. |
... |
additional parameters to be used by specific methods. |
order |
order of the variables. |
values |
the values for each variable, the sample space. |
Details
In as_sevt.bn.fit the order
argument, if provided, must be a topological order of the
bn.fit object (no check is performed). If the order is not provided
a topological order will be used (the one returned by
bnlearn::node.ordering).
In as_sevt.parentslist the order
argument, if provided, must be a topological order of the
corresponding DAG (no check is performed).
If the order is not provided
names(x) is used.
The values parameter is used to specify the sample space
of each variable. For a parentslist object created with
as_parentslist from an object of class sevt,
it is, usually, not needed to specify the values parameter,
since the sample space is saved in the parentslist object.
Value
the equivalent object of class sevt.
Examples
model <- stages_hclust(full(Titanic), k = 2)
plot(model)
pl <- as_parentslist(model)
model2 <- as_sevt(pl)
plot(model2) ## this is a super-model of the first staged tree
## we can check it with
inclusions_stages(model, model2)
Bar plots of stage probabilities
Description
Create a bar plot visualizing probabilities associated to the different stages of a variable in a staged event tree.
Usage
## S3 method for class 'sevt'
barplot(
height,
var,
ignore = height$name_unobserved,
beside = TRUE,
horiz = FALSE,
legend.text = FALSE,
col = NULL,
xlab = ifelse(horiz, "probability", NA),
ylab = ifelse(!horiz, "probability", NA),
...
)
Arguments
height |
an object of class |
var |
name of a variable in |
ignore |
vector of stages which will be ignored and left untouched,
by default the name of the unobserved stages stored in
|
beside |
a logical value. See |
horiz |
a logical value. See |
legend.text |
logical. |
col |
color mapping for the stages, see |
xlab |
a label for the x axis. |
ylab |
a label for the y axis. |
... |
additional arguments passed to |
Value
As barplot:
A numeric vector (or matrix, when beside = TRUE),
giving the coordinates of all the bar midpoints drawn, useful
for adding to the graph.
Examples
model <- stages_fbhc(full(PhDArticles, lambda = 1))
barplot(model, "Kids", beside = TRUE)
Chain event graph (CEG)
Description
Build the CEG representation from an object of class sevt.
Usage
ceg(object)
Arguments
object |
an object of class |
Details
An object of class ceg is a staged event tree object with
additional information on the positions.
Value
an object of class ceg.
Examples
DD <- generate_xor_dataset(3, 100)
model <- stages_bhc(full(DD))
model.ceg <- ceg(model)
model.ceg$positions
Conditional independences matrices of stages
Description
Generate the sequence of all the conditional independences matrices of stages for a given variable in the model.
Usage
ci_matrices(object, var)
Arguments
object |
an object of class |
var |
string, the name of one of the variables in |
Value
A list with i-1 matrices, where i is the depth
of variable var in the tree.
Examples
mod <- sevt(list(A = c("a", "aa"),
B = c("b", "bb", "bbb"),
C = c("c", "cc")), full = TRUE)
stages(mod)["C", A = "a", B = c("b", "bb")] <- "stage1"
stages(mod)["C", A = "aa"] <- "stage2"
stages(mod)["C", A = "a", B = "bbb"] <- "stage2"
ci_matrices(mod, "C")
Context specific interventional discrepancy
Description
Compute the context specific interventional discrepeancy of a staged tree with respect to a reference staged tree.
Usage
cid(object1, object2, FUN = mean)
Arguments
object1 |
an object of class |
object2 |
an object of class |
FUN |
a function that is used to aggregate CID for each variable.
The default |
Value
A list with components:
-
wronga stages-like structure which record whereobject2wrongly infer the interventional distance with respect toobject1. -
cidthe value of the computed CID.
References
Leonelli M., Varando G. Context-Specific Causal Discovery for Categorical Data Using Staged Trees, The 26th International Conference on Artificial Intelligence and Statistics (AISTATS), 2023, https://arxiv.org/abs/2106.04416
Examples
model1 <- stages_bhc(full(Titanic))
model2 <- stages_bhc(full(Titanic,
order = c("Survived", "Sex", "Age", "Class")
))
cid(model1, model2)$cid
cid(model1, model2)$wrong
Compare two staged event tree
Description
Compare two staged event trees, return the differences of the stages structure and plot the difference tree. Three different methods to compute the difference tree are available (see Details).
Usage
compare_stages(
object1,
object2,
method = "naive",
return_tree = FALSE,
plot = FALSE,
...
)
hamming_stages(object1, object2, return_tree = FALSE)
diff_stages(object1, object2)
Arguments
object1 |
an object of class |
object2 |
an object of class |
method |
character, method to compare staged event trees.
One of: |
return_tree |
logical, if |
plot |
logical. |
... |
additional parameters to be passed to |
Details
compare_stages tests if the stage structure of two sevt
objects
is the same.
Three methods are available:
-
naivefirst appliesstndnamingto both objects and then simply compares the resulting stage names. -
hamminguses thehamming_stagesfunction that finds a minimal subset of nodes which stages must be changed to obtain the same structure. -
stagesuses thediff_stagesfunction that compares stages to check whether the same stage structure is present in both models.
Setting return_tree = TRUE will return the stages
difference obtained with the selected method.
The stages difference is a list of numerical vectors with same
lengths and structure as stages(object1) or stages(object2),
where values are 1 if the corresponding node has different
(with respect to the selected method) associated stage, and
0 otherwise.
With plot = TRUE the plot of the difference tree is displayed.
If return_tree = FALSE and plot = FALSE
the logical output is the same for the
three methods and thus the naive method should be used
since it is computationally faster.
hamming_stages finds a minimal set of nodes for which the associated stages
should be changed to obtain equivalent structures. To do that, a maximum-weight bipartite
matching problem between the stages of the two staged trees is solved using the
Hungarian method implemented in the solve_LSAP function of the clue
package.
hamming_stages requires the package clue.
Value
compare_stages: if return_tree = FALSE, logical: TRUE if the two
models are exactly equal, otherwise FALSE.
Else if return_tree = TRUE, the differences between
the two trees, according to the selected method.
hamming_stages: if return_tree = FALSE, integer, the minimum
number of situations where the stage should be changed to obtain the same
models. If return_tree = TRUE a stages-like structure showing which
situations should be modified to obtain the same models.
diff_stages: a stages-like structure marking the situations belonging
to stages which are not the exactly equal.
Examples
data("Asym")
mod1 <- stages_bhc(full(Asym, lambda = 1))
mod2 <- stages_fbhc(full(Asym, lambda = 1))
compare_stages(mod1, mod2)
##########
m0 <- full(PhDArticles[, 1:4], lambda = 0)
m1 <- stages_bhc(m0)
m2 <- stages_bj(m0, distance = "totvar", thr = 0.25)
diff_stages(m1, m2)
Confidence intervals for staged event tree parameters
Description
Confint method for class sevt.
Usage
## S3 method for class 'sevt'
confint(
object,
parm,
level = 0.95,
method = c("wald", "waldcc", "wilson", "goodman", "quesenberry-hurst"),
ignore = object$name_unobserved,
...
)
Arguments
object |
an object of class |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
the confidence level required. |
method |
a character string specifing which method to use: wald", "waldcc", "goodman", "quesenberry-hurst" or "wilson". |
ignore |
vector of stages which will be ignored,
by default the name of the unobserved stages stored in
|
... |
additional argument(s) for compatibility
with |
Details
Compute confidence intervals for staged event trees. Currently five methods are available:
-
wald,waldcc: Wald method and with continuity correction. -
wilson,quesenberry-hurstandgoodman.
Value
A matrix with columns giving lower and upper confidence
limits for each parameter. These will be labelled as
(1-level)/2 and 1 - (1-level)/2 in %
(by default 2.5% and 97.5%).
Author(s)
The function is partially inspired by code in the
MultinomCI function from the DescTools package,
implemented by Andri Signorelli and Pablo J. Villacorta Iglesias.
References
Goodman, L. A. (1965) On Simultaneous Confidence Intervals for Multinomial Proportions Technometrics, 7, 247-254.
Wald, A. Tests of statistical hypotheses concerning several parameters when the number of observations is large, Trans. Am. Math. Soc. 54 (1943) 426-482.
Wilson, E. B. Probable inference, the law of succession and statistical inference, J.Am. Stat. Assoc. 22 (1927) 209-212.
Quesenberry, C., & Hurst, D. (1964). Large Sample Simultaneous Confidence Intervals for Multinomial Proportions. Technometrics, 6(2), 191-195
Examples
m1 <- stages_bj(full(PhDArticles), distance = "kullback", thr = 0.01)
confint(m1, "Prestige", level = 0.90)
confint(m1, "Married", method = "goodman")
confint(m1, c("Married", "Kids"))
Trajectories of hospitalized SARS-CoV-2 patients
Description
Dataset with observations from four variables (Sex, Age, ICU, death) for 10000 simulated SARS-CoV-2 hospital patients.
Usage
covid_patients
Format
A data frame with 10000 observations of 4 variables. The variables and their levels are as follows:
Sex: Female, Male
Age: 0-39, 40-49, 50-59, 60-69, 70-79, 80+
ICU: yes, no
death: yes, no
Details
The data are simulated from an event tree where conditional probabilities for ICU and death are taken from the results of Lefrancq et al. (2021). Lefrancq et al. (2021) estimated such probabilities from data on patients, recorded in the SI-VIC database, who started their hospitalization between 13 March and 30 November 2020.
Source
The data has been generated with the code in the Examples section. Conditional probabilities were copied from the tables in the Supplementary materials of Lefrancq et al. (2021). Marginal probabilities of gender and probabilities of age given gender were instead obtained from the linked GitHub repository https://github.com/noemielefrancq/Evolution-Outcomes-COVID19-France.
References
Leonelli, M. and Varando, G. (2023). Context-Specific Causal Discovery for Categorical Data Using Staged Trees. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:8871-8888 Available from https://proceedings.mlr.press/v206/leonelli23a.html.
Lefrancq N., Paireau J., Hozé N., Courtejoie N., Yazdanpanah Y., Bouadma L. (2021). Evolution of outcomes for patients hospitalised during the first 9 months of the SARS-CoV-2 pandemic in France: A retrospective national surveillance data analysis. The Lancet Regional Health - Europe, 5:100087.
Examples
library(stagedtrees)
data_model <- sevt(list(
Sex = c("Female", "Male"),
Age = c(
"0-39", "40-49", "50-59", "60-69",
"70-79", "80+"
),
ICU = c("yes", "no"),
death = c("yes", "no")
), full = TRUE)
data_model$prob <- list()
data_model$prob$Sex <- list("1" = c(Female = 0.45185, Male = 0.54815))
dist_age_male <- c(
0.01616346, # 0 - 39
0.04159445, # 40 - 49
0.10130439, # 50 - 59
0.16825686, # 60 - 69
0.25217550, # 70 - 79
0.42050534
) # 80+
dist_age_female <- c(
0.01688613, # 0 - 39
0.04271329, # 40 - 49
0.10131681, # 50 - 59
0.16841872, # 60 - 69
0.25289366, # 70 - 79
0.41777138
) # 80+
names(dist_age_male) <- data_model$tree$Age
names(dist_age_female) <- data_model$tree$Age
data_model$prob$Age <- list(
"1" = dist_age_female,
"2" = dist_age_male
)
data_model$prob$ICU <- list(
"1" = c(yes = 0.125, no = 1 - 0.125), # Female 0-39
"2" = c(yes = 0.149, no = 1 - 0.149), # Female 40-49
"3" = c(yes = 0.193, no = 1 - 0.193), # Female 50-59
"4" = c(yes = 0.225, no = 1 - 0.225), # Female 60-69
"5" = c(yes = 0.175, no = 1 - 0.175), # Female 70-79
"6" = c(yes = 0.037, no = 1 - 0.037), # Female 80+
"7" = c(yes = 0.197, no = 1 - 0.197), # Male 0-39
"8" = c(yes = 0.2687, no = 1 - 0.2687), # Male 40-49
"9" = c(yes = 0.3171, no = 1 - 0.3171), # Male 50-59
"10" = c(yes = 0.3415, no = 1 - 0.3415), # Male 60-69
"11" = c(yes = 0.274, no = 1 - 0.274), # Male 70-79
"12" = c(yes = 0.073, no = 1 - 0.073) # Male 80+
)
data_model$prob$death <- list(
################### FEMALE ################################
"1" = c(yes = 0.077, no = 1 - 0.077), # Female 0-39 ICU
"2" = c(yes = 0.004, no = 1 - 0.004), # Female 0-39 no-ICU
"3" = c(yes = 0.117, no = 1 - 0.117), # Female 40-49 ICU
"4" = c(yes = 0.017, no = 1 - 0.017), # Female 40-49 no-ICU
"5" = c(yes = 0.185, no = 1 - 0.185), # Female 50-59 ICU
"6" = c(yes = 0.030, no = 1 - 0.030), # Female 50-59 no-ICU
"7" = c(yes = 0.239, no = 1 - 0.239), # Female 60-69 ICU
"8" = c(yes = 0.058, no = 1 - 0.058), # Female 60-69 no-ICU
"9" = c(yes = 0.324, no = 1 - 0.324), # Female 70-79 ICU
"10" = c(yes = 0.124, no = 1 - 0.124), # Female 70-79 no-ICU
"11" = c(yes = 0.454, no = 1 - 0.454), # Female 80+ ICU
"12" = c(yes = 0.266, no = 1 - 0.266), # Female 80+ no-ICU
################# MALE ##################################
"13" = c(yes = 0.079, no = 1 - 0.079), # Male 0-39 ICU
"14" = c(yes = 0.008, no = 1 - 0.008), # Male 0-39 no-ICU
"15" = c(yes = 0.098, no = 1 - 0.098), # Male 40-49 ICU
"16" = c(yes = 0.016, no = 1 - 0.016), # Male 40-49 no-ICU
"17" = c(yes = 0.171, no = 1 - 0.171), # Male 50-59 ICU
"18" = c(yes = 0.030, no = 1 - 0.030), # Male 50-59 no-ICU
"19" = c(yes = 0.278, no = 1 - 0.278), # Male 60-69 ICU
"20" = c(yes = 0.067, no = 1 - 0.067), # Male 60-69 no-ICU
"21" = c(yes = 0.383, no = 1 - 0.383), # Male 70-79 ICU
"22" = c(yes = 0.150, no = 1 - 0.150), # Male 70-79 no-ICU
"23" = c(yes = 0.478, no = 1 - 0.478), # Male 80+ ICU
"24" = c(yes = 0.363, no = 1 - 0.363) # Male 80+ no-ICU
)
# covid_patients <- sample_from(data_model, 10000, seed = 123)
# usethis::use_data(covid_patients, overwrite = TRUE)
Extract dependency subtree
Description
Extract the dependency subtree of a staged tree with respect to a variable
Usage
depsubtree(object, var, other_stages = c("NA", "indep", "full"))
Arguments
object |
an object of class |
var |
the name of one of the variable of the staged event tree. |
other_stages |
how to set stages for other variables (if any). |
Details
The dependency sub-tree is a staged event tree which is
sufficient to describe the conditional distribution of the variable
var given its predecessors in the original tree represented by
object.
In particular the preceding variables are restricted to the
parents of var in the minimal-DAG obtained with
as_parentslist. This is the minimal set of
variables which contexts are sufficient to fully represent the
conditional distribution of var.
Stages for variables different from var are either set to
NA, or to the full or indep model, depending on other_stages.
Value
an object of class sevt representing the
dependency sub-tree.
Examples
mod <- stages_kmeans(full(Titanic), k = 2)
par(mfrow = c(1, 2))
plot(mod, main = "staged tree")
plot(depsubtree(mod, "Age"), main = "dependency subtree for Age")
par(mfrow = c(1, 1))
Compute the distance matrix
Description
Compute the matrix of distances between probabilities, e.g the transition probabilities for a given variable in a staged event tree.
Usage
distance_mat_stages(x, distance = probdist.kl)
Arguments
x |
list of conditional probabilities for each stage. |
distance |
the distance function e.g. |
Value
The matrix with the distances between stages.
Plot an edge
Description
Plot an edge
Usage
edge(from, to, label = "", col = "black", cex_label = 1, ...)
Arguments
from |
From |
to |
To |
label |
the label |
col |
color |
cex_label |
numerical |
... |
additional parameters passed to |
Erase the sevt fit
Description
Erase the sevt fit
Usage
erase_fit(object)
Arguments
object |
an object of class |
Value
an object of class sevt without
prob and ll field.
Expand probabilities of a staged event tree
Description
Return the list of complete probability tables.
Usage
expand_prob(object)
Arguments
object |
a fitted staged event tree object. |
Value
probability tables.
Find the stage of the path
Description
no checking is done.
Usage
find_stage(object, path)
Arguments
object |
a staged event tree object. |
path |
vector of the path. |
Value
the stage name corresponding of the path.
Full and independent staged event tree
Description
Build fitted staged event tree from data.
Usage
full(
data,
order = NULL,
join_unobserved = TRUE,
lambda = 0,
name_unobserved = "UNOBSERVED"
)
## S3 method for class 'table'
full(
data,
order = names(dimnames(data)),
join_unobserved = TRUE,
lambda = 0,
name_unobserved = "UNOBSERVED"
)
## S3 method for class 'data.frame'
full(
data,
order = colnames(data),
join_unobserved = TRUE,
lambda = 0,
name_unobserved = "UNOBSERVED"
)
indep(
data,
order = NULL,
join_unobserved = TRUE,
lambda = 0,
name_unobserved = "UNOBSERVED"
)
## S3 method for class 'table'
indep(
data,
order = names(dimnames(data)),
join_unobserved = TRUE,
lambda = 0,
name_unobserved = "UNOBSERVED"
)
## S3 method for class 'data.frame'
indep(
data,
order = colnames(data),
join_unobserved = TRUE,
lambda = 0,
name_unobserved = "UNOBSERVED"
)
Arguments
data |
data to create the model, data.frame or table. |
order |
character vector, order of variables. |
join_unobserved |
logical, if situations with zero observations should be joined (default TRUE). |
lambda |
smoothing coefficient (default 0). |
name_unobserved |
name to pass to |
Details
Functions to create full or independent staged tree models from
data.
The full (or saturated) staged tree is the model where every
situation is in a different stage, and thus the model has the
maximum number of parameters.
Conversely, the independent staged tree (indep) assigns
all the situations related to the same variable to the same
stage, thus it is equivalent to the independence factorization.
Examples
## full model
DD <- generate_xor_dataset(4, 100)
model_full <- full(DD, lambda = 1)
## independence model (data.frame)
DD <- generate_xor_dataset(4, 100)
model <- indep(DD, lambda = 1)
model
Generate a random binary dataset for classification
Description
Randomly generate a simple classification problem.
Usage
generate_linear_dataset(
p,
n,
eps = 1.2,
gamma = runif(1, min = -p, max = p),
alpha = runif(p, min = -p, max = p)
)
Arguments
p |
number of variables. |
n |
number of observations. |
eps |
noise. |
gamma |
numeric. |
alpha |
numeric vector of length |
Value
A data.frame with n independent random variables and
one class variable C computed as
sign(sum(x * alpha) + runif(1, -eps, eps) + gamma).
Examples
DD <- generate_linear_dataset(p = 5, n = 1000)
Generate a random binary dataset
Description
Randomly generate a data.frame of independent binary variables.
Usage
generate_random_dataset(p, n)
Arguments
p |
number of variables. |
n |
number of observations. |
Value
A data.frame with n independent random variables.
Examples
DD <- generate_random_dataset(p = 5, n = 1000)
Generate a xor dataset
Description
Generate a xor dataset
Usage
generate_xor_dataset(p, n, eps = 1.2)
Arguments
p |
number of variables. |
n |
number of observations. |
eps |
error. |
Value
The xor dataset with n + 1 variables, where the first one is
the class variable C computed as a noisy xor.
Examples
DD <- generate_xor_dataset(p = 5, n = 1000, eps = 1.2)
Get stage or path
Description
Utility functions to obtain stages from paths and paths from stages.
Usage
get_stage(object, path)
get_path(object, var, stage)
Arguments
object |
an object of class |
path |
character vector, the path from root or a two dimensional array where each row is a path from root. |
var |
character, one of the variable in the staged tree. |
stage |
character vector, the name of the stages for which the paths should be returned. |
Value
get_stage returns
the stage name(s) for given path(s).
get_path returns a
data.frame containing the paths
corresponding to the given stage(s).
Examples
model <- stages_fbhc(full(PhDArticles))
get_stage(model, c("0", "male"))
paths <- expand.grid(model$tree[2:1])[, 2:1]
get_stage(model, paths)
get_path(model, "Kids", "5")
get_path(model, "Gender", "2")
get_path(model, "Kids", c("5", "6"))
Check sevt objects
Description
Check sevt objects
Usage
has_ctables(object)
has_prob(object)
is_fitted_sevt(object)
check_sevt(object, arg = rlang::caller_arg(object), call = rlang::caller_env())
check_tree(tree, arg = rlang::caller_arg(tree), call = rlang::caller_env())
check_stages(
object,
arg = rlang::caller_arg(object),
call = rlang::caller_env()
)
check_sevt_prob(
object,
arg = rlang::caller_arg(object),
call = rlang::caller_env()
)
check_sevt_ctables(
object,
arg = rlang::caller_arg(object),
call = rlang::caller_env()
)
check_sevt_fit(
object,
arg = rlang::caller_arg(object),
call = rlang::caller_env()
)
check_same_tree(
object,
object2,
arg1 = rlang::caller_arg(object),
arg2 = rlang::caller_arg(object2),
call = rlang::caller_env()
)
check_var_in(
var,
object,
arg1 = rlang::caller_arg(var),
arg2 = rlang::caller_arg(object),
call = rlang::caller_env()
)
check_scope(
x,
object,
arg1 = rlang::caller_arg(x),
arg2 = rlang::caller_arg(object),
call = rlang::caller_env()
)
check_path(x, tree)
check_context(x, var, tree)
Arguments
object |
an object of class sevt |
arg |
passed arg name |
call |
passed call |
tree |
a list of levels specifying an event tree |
object2 |
a staged event tree object. |
arg1 |
passed arg1 name |
arg2 |
passed arg2 name |
var |
name of a variable to be checked. |
x |
scope, context or path to be checked against a model or tree |
Value
logical.
logical.
logical.
igraph conversion
Description
Obtain the graph representation of a staged tree or a CEG as an object from the igraph package.
Usage
get_edges(x, ignore = x$name_unobserved, ...)
## S3 method for class 'sevt'
get_edges(x, ignore = x$name_unobserved, ...)
get_vertices(x, ignore = x$name_unobserved, ...)
## S3 method for class 'sevt'
get_vertices(x, ignore = x$name_unobserved, ...)
## S3 method for class 'ceg'
get_edges(x, ignore = x$name_unobserved, ...)
## S3 method for class 'ceg'
get_vertices(x, ignore = x$name_unobserved, ...)
as_igraph(x, ignore = x$name_unobserved, ...)
## S3 method for class 'sevt'
as_igraph(x, ignore = x$name_unobserved, ...)
## S3 method for class 'ceg'
as_igraph(x, ignore = x$name_unobserved, ...)
Arguments
x |
|
ignore |
vector of stages which will be ignored and excluded,
by default the name of the unobserved stages stored in
|
... |
additional parameters. |
Details
Functions to transalte the graph structure of a sevt
or ceg object to a graph object from the
igraph package.
Additional functions that extract the edge lists
and the vertices are available.
This can be useful, for example to plot the staged tree with
igraph or additional packages (see the examples).
Value
for get_edges: the edges list corresponding
to the graph associated to x.
for get_vertices: the vertices list corresponding
to the graph associated to x.
for as.igraph: a graph object from the
igraph package.
Examples
mod <- stages_bhc(full(Titanic))
get_edges(mod)
get_vertices(mod)
## Not run:
library(igraph)
library(ggraph)
######## sevt example ########
## convert to igraph object
g <- as_igraph(mod)
## plot with igraph directly
plot(g, layout = layout_with_sugiyama)
## plot with ggraph
ggraph(g, "sugiyama") +
geom_edge_fan(
aes(
label = label,
label_pos = 0.5 + runif(length(label), -0.1, 0.1)
),
angle_calc = "along", show.legend = FALSE, check_overlap = FALSE,
end_cap = circle(0.02, "npc"),
arrow = grid::arrow(
angle = 25,
length = unit(0.025, "npc"),
type = "closed"
)
) +
geom_node_point(aes(x = x, y = y, color = stage),
size = 5,
show.legend = FALSE
) +
ggforce::theme_no_axes() + coord_flip() + scale_y_reverse()
######## ceg example ########
g.ceg <- as_igraph(ceg(mod))
### igraph plotting functions can be used
plot(g.ceg, layout = layout.sugiyama)
### igraph object can be also plotted with ggplot2 and ggraph
ggraph(g.ceg, "sugiyama") +
geom_edge_fan(
aes(
label = label,
color = label,
label_pos = 0.5 + runif(length(label), -0.1, 0.1)
),
angle_calc = "along", show.legend = FALSE, check_overlap = FALSE,
end_cap = circle(0.02, "npc"),
arrow = grid::arrow(
angle = 25,
length = unit(0.025, "npc"),
type = "closed"
)
) +
geom_node_point(aes(x = x, y = y, color = stage), size = 3, show.legend = FALSE) +
ggforce::theme_no_axes() + coord_flip() + scale_y_reverse()
## End(Not run)
Inclusions of stages
Description
Display the relationship between two staged tree models over the same variables.
Usage
inclusions_stages(object1, object2)
Arguments
object1 |
an object of class |
object2 |
an object of class |
Details
Computes the relations between the stages structures of the two models.
The relations between stages of the same variable
are stored in a data frame with three columns
where each row represent
a relation between a stage of the first model (s1) and
a stage of the second model (s2).
The relation can be one of the following: inclusion (s1 < s2
or s1 > s2; equal (s1 = s2); not-equal (s1 != s2).
Value
a list with inclusion relations between stage structures for each variable in the models.
Examples
mod1 <- stages_bhc(full(PhDArticles[, 1:5], lambda = 1))
mod2 <- stages_fbhc(full(PhDArticles[, 1:5], lambda = 1))
inclusions_stages(mod1, mod2)
join positions in a staged tree model
Description
join positions in a staged tree model
Usage
join_positions(model, var, s1, s2)
Arguments
model |
an object of class |
var |
the name of a variable in the model. |
s1 |
stage to join |
s2 |
stage to join |
Details
this functions works similarly to the join_stages
function in the stagedtrees package, but it also joins
downstream stages to make nodes with stages s1,s2 in the same
position. This function works properly only when downstream variables
from var have full stages vectors.
Join stages
Description
Join two stages in a staged event tree object, updating probabilities and log-likelihood accordingly.
Usage
join_stages(object, var, s1, s2)
join_stages_unsafe(object, var, s1, s2)
join_all(object, var, stages, ignore = NULL)
Arguments
object |
an object of class |
var |
variable. |
s1 |
first stage. |
s2 |
second stage. |
stages |
a vector of stage names for variable |
ignore |
vector of stages which will be ignored and left untouched. |
Details
This function joins two stages associated to the same variable, updating probabilities and log-likelihood if the object was fitted.
Value
the staged event tree where s1 and s2 are joined.
Examples
model <- full(PhDArticles, lambda = 0)
model <- stages_fbhc(model)
model$stages$Kids
model <- join_stages(model, "Kids", "5", "6")
model$stages$Kids
Join situations with no observations
Description
Join situations with no observations
Usage
join_unobserved(
object,
fit = TRUE,
trace = 0,
name = "UNOBSERVED",
scope = sevt_varnames(object)[-1],
lambda = object$lambda
)
Arguments
object |
an object of class |
fit |
if TRUE update model's probabilities. |
trace |
if |
name |
character, name for the new stage storing unobserved situations. |
scope |
character vector, list of variables in |
lambda |
smoothing parameter for the fitting. |
Details
It takes as input a (fitted) staged event tree object and it joins, in the same stage, all the situations with zero recorded observations. Since such joining does not change the log-likelihood of the model, it is a useful (time-wise) pre-processing prior to others model selection algorithms.
Unobserved situations can be joined directly in
full or indep, by setting
join_unobserved = TRUE.
Value
a staged event tree with at most one stage per variable with
no observations.
If, as default, fit=TRUE the model will be re-fitted, if
fit=FALSE probabilities in the output model are not estimated.
Examples
DD <- generate_xor_dataset(p = 5, n = 10)
model_full <- full(DD, lambda = 1, join_unobserved = FALSE)
model <- join_unobserved(model_full)
logLik(model_full)
logLik(model)
BIC(model_full, model)
Log-Likelihood of a staged event tree
Description
Compute, or extract the log-likelihood of a staged event tree.
Usage
## S3 method for class 'sevt'
logLik(object, ...)
Arguments
object |
an fitted object of class |
... |
additional parameters (compatibility). |
Value
An object of class logLik.
Examples
data("PhDArticles")
mod <- indep(PhDArticles)
logLik(mod)
Likelihood Ratio Test for staged trees models
Description
Function to perform likelihood ratio test between two or multiple staged event tree models.
Usage
lr_test(object, ...)
Arguments
object |
an object of class |
... |
further objects of class |
Details
If a single object of class sevt is passed as
argument, it computes
the likelihood-ratio test with respect to the
independence model.
If multiple objects are passed,
likelihood-ratio tests between the first
object and the followings are computed.
In the latter case the function checks automatically if
the first model is nested in the additional ones,
via inclusions_stages, and throws
an error if not.
Value
An object of class anova
which contains the log-likelihood,
degrees of freedom,
difference in degrees of freedom, likelihood ratio
statistics and corresponding p values.
Examples
data(PhDArticles)
order <- c("Gender", "Kids", "Married", "Articles")
phd.mod1 <- stages_hc(indep(PhDArticles, order))
phd.mod2 <- stages_hc(full(PhDArticles, order))
## compare two nested models
lr_test(phd.mod1, phd.mod2)
## compare a single model vs the independence model
lr_test(phd.mod1)
Distribute counts along tree
Description
Create the list of ftables
storing the observations distributed along
the path of the tree.
Usage
make_ctables(object, data, useNA = "ifany")
Arguments
object |
A stratified event tree, a list with a |
data |
table or data.frame containing observations
of the variable in |
useNA |
whether to include NA values in the tables.
Argument passed to |
Details
Distribute the counts along the event tree.
This is an internal function, the user will
usually just directly fit the staged event tree
model using sevt.fit.
We refer here to stratified event tree, because actually
the stage information is never used and thus this function
will work for an object with only a tree field.
Value
A list of ftables.
New label
Description
give a safe-to-add label that is not in labels.
Usage
new_label(labels)
Arguments
labels |
vector of labels. |
Value
a string label that is different from each labels.
Plot a node
Description
Plot a node
Usage
node(x, label = "", col = "black", cex_label = 1, cex_node = 1, ...)
Arguments
x |
the center |
label |
the label |
col |
color |
cex_label |
cex parameter to be passed to text |
cex_node |
cex parameter for nodes |
... |
additional parameters passed to |
noisy xor function
Description
noisy xor function
Usage
noisy_xor(x, eps = 0)
Arguments
x |
a vector of +1 and -1. |
eps |
the uniform noise amount. |
Value
the computed noisy xor.
Compute probability of a path from root
Description
Internal function to compute probability of a path. It does not check the validity of the path.
Usage
path_probability(object, x, log = FALSE)
Arguments
object |
An object of class |
x |
the path, expressed as a character vector containing the sequence of the value of the variables. |
log |
logical, if |
Details
Computes the probability of following a given path (x) starting from the root.
Can be a full path from the root to a leaf or a shorter path.
Value
The probability of the given path or its logarithm if log=TRUE.
igraph's plotting for CEG
Description
igraph's plotting for CEG
Usage
## S3 method for class 'ceg'
plot(x, col = NULL, ignore = x$name_unobserved, layout = NULL, ...)
Arguments
x |
an object of class |
col |
colors specification see |
ignore |
vector of stages which will be ignored and left untouched,
by default the name of the unobserved stages stored in
|
layout |
an igraph layout. |
... |
additional arguments passed to |
Details
This function is a simple wrapper around
igraph's plot.igraph.
The ceg object is converted to an igraph object
with as_igraph.
If not specified, the default layout used is
a rotated layout.sugiyama.
We use palette() as palette for
the igraph plotting, while plot.igraph uses
as default a different palette. This is to allow matching
stages colors between plot.ceg
and plot.sevt.
Examples
## Not run:
model <- stages_bhc(full(Titanic))
model.ceg <- ceg(model)
plot(model.ceg, edge.arrow.size = 0.1, vertex.label.dist = -2)
## End(Not run)
Plot method for staged event trees
Description
Plot method for staged event tree objects. It allows easy plotting of staged event trees with some options (see Examples).
Usage
## S3 method for class 'sevt'
plot(
x,
y = 10,
limit = y,
xlim = c(0, 1),
ylim = c(0, 1),
main = NULL,
sub = NULL,
asp = 1,
cex_label_nodes = 0,
cex_label_edges = 1,
cex_nodes = 2,
cex_tree_y = 0.9,
col = NULL,
col_edges = "black",
var_names = TRUE,
ignore = x$name_unobserved,
pch_nodes = 16,
lwd_nodes = 1,
lwd_edges = 1,
...
)
make_stages_col(x, col = NULL, ignore = x$name_unobserved, limit = NULL)
Arguments
x |
an object of class |
y |
alias for |
limit |
maximum number of variables plotted. |
xlim |
the x limits (x1, x2) of the plot. |
ylim |
the y limits of the plot. |
main |
an overall title for the plot. |
sub |
a sub title for the plot. |
asp |
the y/x aspect ratio. |
cex_label_nodes |
the magnification to be used for
the node labels.
If set to |
cex_label_edges |
the magnification
for the edge labels.
If set to |
cex_nodes |
the magnification for the nodes of the tree. |
cex_tree_y |
the magnification for the
tree in the vertical direction.
Default is |
col |
color mapping for stages, one of the following:
NULL (color will be assigned based on the current palette);
a named (variables) list of named (stages)
vectors of colors;
the character |
col_edges |
color for the edges. |
var_names |
logical, if variable names should be added to the plot,
otherwise variable names can be added manually using
|
ignore |
vector of stages which will be ignored and left untouched,
by default the name of the unobserved stages stored in
|
pch_nodes |
either an integer specifying a symbol or a single character
to be used as the default in plotting nodes shapes see
|
lwd_nodes |
the line width for edges, a positive number, defaulting to 1. |
lwd_edges |
the line width for nodes, a positive number, defaulting to 1. |
... |
additional graphical parameters to be passed to
|
Examples
data("PhDArticles")
mod <- stages_bj(full(PhDArticles, join_unobserved = TRUE))
### simple plotting
plot(mod)
### labels in nodes
plot(mod, cex_label_nodes = 1, cex_nodes = 0)
### reduce nodes size
plot(mod, cex_nodes = 0.5)
### change line width and nodes style
plot(mod, lwd_edges = 3, pch_nodes = 5)
### changing palette
plot(mod, col = function(s) heat.colors(length(s)))
### or changing global palette
palette(hcl.colors(10, "Harmonic"))
plot(mod)
palette("default") ##
### forcing plotting of unobserved stages
plot(mod, ignore = NULL)
### use function to specify colors
plot(mod, col = function(stages) {
hcl.colors(n = length(stages))
})
### manually give stages colors
### as an example we will assign colors only to the stages of two variables
### Gender (one stage named "1") and Mentor (six stages)
col <- list(
Gender = c("1" = "blue"),
Mentor = c(
"UNOBSERVED" = "grey",
"2" = "red",
"3" = "purple",
"10" = "pink",
"18" = "green",
"22" = "brown"
)
)
### by setting ignore = NULL we will plot also the UNOBSERVED stage for Mentor
plot(mod, col = col, ignore = NULL)
Predict method for staged event tree
Description
Predict class values from a staged event tree model.
Usage
## S3 method for class 'sevt'
predict(object, newdata = NULL, class = NULL, prob = FALSE, log = FALSE, ...)
Arguments
object |
an object of class |
newdata |
the newdata to perform predictions |
class |
character, the name of the variable to use as
the class variable, if NULL the first element |
prob |
logical, if |
log |
logical, if |
... |
additional parameters, see details |
Details
Predict the most probable a posterior value for the class variable given all the other variables in the model. Ties are broken at random and if, for a given vector of predictor variables, all conditional probabilities are 0, NA is returned.
if prob = TRUE, a matrix with number of rows equals to the number of
rows in the newdata and number of columns as the number of levels of the
class variable is returned. if log = TRUE, log-probabilities are returned.
if prob = FALSE, a vector of length as the number of rows in the newdata
with the level with higher estimated probability for each new observations is returned.
Value
A vector of predictions or the corresponding matrix of probabilities.
Examples
DD <- generate_xor_dataset(p = 4, n = 600)
order <- c("C", "X1", "X2", "X3", "X4")
train <- DD[1:500, order]
test <- DD[501:600, order]
model <- full(train)
model <- stages_bhc(model)
pr <- predict(model, newdata = test, class = "C")
table(pr, test$C)
# class values:
predict(model, newdata = test, class = "C")
# probabilities:
predict(model, newdata = test, class = "C", prob = TRUE)
# log-probabilities:
predict(model, newdata = test, class = "C", prob = TRUE, log = TRUE)
Print a staged event tree
Description
Print a staged event tree
Usage
## S3 method for class 'sevt'
print(x, ..., max = 5)
Arguments
x |
an object of class |
... |
additional parameters (compatibility). |
max |
integer, limit on the numebr of variables to print. |
Details
The order of the variables in the staged tree is printed (from root). In addition the number of levels of each variable is shown in square brackets. If available the log-likelihood of the model is printed.
Value
An invisible copy of x.
Examples
DD <- generate_xor_dataset(5, 100)
model <- full(DD, lambda = 1)
print(model)
Probabilities for a staged event tree
Description
Compute (marginal and/or conditional) probabilities of elementary events with respect to the probability encoded in a staged event tree.
Usage
prob(object, x, conditional_on = NULL, log = FALSE, na0 = TRUE)
Arguments
object |
an object of class |
x |
the vector or data.frame of observations. |
conditional_on |
named vector, the conditioning event. |
log |
logical, if |
na0 |
logical, if |
Details
Computes probabilities related to a vector or a data.frame of observations.
Optionally, conditional probabilities can be obtained by specifying
the conditioning event in conditional_on. This can be done either
with a single named vector or with a data.frame object with the
same number of rows of x. In the former, the same conditioning
is used for all the computed probabilities (if x has multiple rows);
while with the latter different conditioning events (but on the same variables)
can be specified for each row of x.
Value
the probabilities to observe each observation in x, possibly
conditional on the event(s) in conditional_on.
Examples
data(Titanic)
model <- full(Titanic, lambda = 1)
samples <- expand.grid(model$tree[c(1, 4)])
pr <- prob(model, samples)
## probabilities sum up to one
sum(pr)
## print observations with probabilities
print(cbind(samples, probability = pr))
## compute one probability
prob(model, c(Class = "1st", Survived = "Yes"))
## compute conditional probability
prob(model, c(Survived = "Yes"), conditional_on = c(Class = "1st"))
## compute conditional probabilities with different conditioning set
prob(model, data.frame(Age = rep("Adult", 8)),
conditional_on = expand.grid(model$tree[2:1])
)
## the above should be the same as
summary(model)$stages.info$Age
Distances between probabilities
Description
Distances between probabilities
Usage
probdist.l2(x, y)
probdist.l1(x, y)
probdist.ry(x, y)
probdist.kl(x, y)
probdist.tv(x, y)
probdist.hl(x, y)
probdist.bh(x, y)
probdist.cd(x, y)
Arguments
x |
vector of probabilities. |
y |
vector of probabilities. |
Details
Functions to compute distances between probabilities:
-
lp: theL^pdistance,||x - y||_p^pforp = 1,2 -
ry: the symmetric Renyi divergence of order\alpha = 2 -
kl: the symmetrized Kullback-Leibler divergence -
tv: the total variation orL^1norm -
hl: the (squared) Hellinger distance -
bh: the Bhattacharyya distance -
cd: the Chan-Darwiche distance
Value
The distance between p and q
Generate a random parentslist object (DAG)
Description
generate a random DAG coded as
parentslist object.
Usage
random_parentslist(n, k = 2, maxp = n)
Arguments
n |
number of variables. |
k |
maximum number of levels for each variable. |
maxp |
maximum cardinality of parents sets. |
Details
For each variable a subset of random cardinality
(maximum maxp) of the preceding
variables is randomly selected as parents set.
The possible levels of each variables are randomly selected
in 2,...,k.
Value
a parentslist object.
Examples
random_parentslist(5, 3, 2)
## we can generate the associated staged tree
pl <- random_parentslist(4, 2, 2)
plot(as_sevt(pl), main = as.character(pl))
Generate a random (fitted) sevt
Description
Generate a random sevt from a DAG or a tree.
Probabilities are also randomly generated.
Usage
random_sevt(x, q = 0.5, rfun = rexp)
## S3 method for class 'list'
random_sevt(x, q = 0.5, rfun = rexp)
## S3 method for class 'parentslist'
random_sevt(x, q = 0.5, rfun = rexp)
## S3 method for class 'sevt'
random_sevt(x, q = 0.5, rfun = rexp)
Arguments
x |
a |
q |
probability of joining stages. |
rfun |
a function which is used to generate random conditional probabilities associated to each stage. |
Details
The generated staged tree is obtained by randomly
joining stages with probability q.
For random_sevt.list, x should be
a list representing an event tree, same format
as lists provided to sevt.list.
The random generated sevt will be
obtained by randomly joining stages starting from
a full staged event tree.
For random_sevt.parentslist, x should be
a parentslist object
representing a DAG, this could be obtained with
as_parentslist or with
random_parentslist.
The random generated sevt will be
obtained by randomly joining stages starting from
a the staged tree equivalent to the DAG.
For random_sevt.sevt, x should be
a sevt.
The random generated sevt will be
obtained by randomly joining stages starting
from the provided sevt object.
Stages (conditional) probabilities are sampled from
the corresponding probability simplex by generating
a vector with the user-defined function \code{rfun} and
normalizing it to sum up to one.
Absolute value is applied to assure non-negativity.
The default \code{rfun = rexp} induces a uniform sampling
from the probability simplex.
Value
A randomly generated fitted sevt object.
Examples
model_gt <- random_sevt(list(
X = c("a", "b"), Y = c("c", "d", "e"),
Z = c("1", "2", "3"), W = c("yes", "no")
))
## sample data from model_gt and estimate a staged tree
data <- sample_from(model_gt, 100)
model_est <- stages_bhc(full(data))
## compare true and estimated model
hamming_stages(model_gt, model_est)
compare_stages(model_gt, model_est, method = "hamming", plot = TRUE)
Rename stage(s) in staged event tree
Description
Change the name of a stage in a staged event tree.
Usage
rename_stage(object, var, stage, new)
Arguments
object |
an object of class |
var |
name of a variable in |
stage |
name of the stage to be renamed. |
new |
new name for the stage. |
Details
No internal checks are performed and as side effect
stages can be joined, if e.g. new is equal to the name
of a stage for variable var.
Value
a staged event tree object where stages stage
have been renamed to new.
Sample from a staged event tree
Description
Generate a random sample from the distribution encoded in a staged event tree object.
Usage
sample_from(object, size = 1, seed = NULL)
Arguments
object |
an object of class |
size |
number of observations to sample. |
seed |
an object specifying if and how the random number generator should be initialized (‘seeded’). Either NULL or an integer that will be used in a call to set.seed. |
Details
It samples size observations according to
the transition probabilities (object$prob) in the model.
Value
A data frame containing size observations from the
variables in object.
Examples
model <- stages_fbhc(full(PhDArticles, lambda = 1))
sample_from(model, 10)
Optimal Order Search
Description
Find the optimal staged event tree with a dynamic programming approach.
Usage
search_best(
data,
alg = stages_bhc,
search_criterion = BIC,
lambda = 0,
join_unobserved = TRUE,
...
)
Arguments
data |
either a data.frame or a table containing the data. |
alg |
a function that performs stages structure estimation. Similar to
|
search_criterion |
the criterion minimized in the order search. |
lambda |
numerical value passed to |
join_unobserved |
logical, passed to |
... |
additional arguments, passed to |
Details
This function is an implementation of the
dynamic programming approach
of Silander and Leong (2013).
If the search_criterion is decomposable
the returned model attains the best value
among all possible orders.
Value
The estimated staged event tree model.
References
Silander T., Leong TY. A Dynamic Programming Algorithm for Learning Chain Event Graphs. In: Fürnkranz J., Hüllermeier E., Higuchi T. (eds) Discovery Science. DS 2013. Lecture Notes in Computer Science, vol 8140. Springer, Berlin, Heidelberg. 2013.
Cowell R and Smith J. Causal discovery through MAP selection of stratified chain event graphs. Electronic Journal of Statistics, 8(1):965–997, 2014.
Examples
## default search using BIC score
model <- search_best(Titanic, alg = stages_kmeans)
## use df as search_criterion
model1 <- search_best(Titanic, alg = stages_bhc,
search_criterion = function(m) attr(logLik(m), "df"))
Greedy Order Search
Description
Search the optimal staged event tree with a greedy heuristic.
Usage
search_greedy(
data,
alg = stages_bhc,
search_criterion = BIC,
lambda = 0,
join_unobserved = TRUE,
...
)
Arguments
data |
either a data.frame or a table containing the data. |
alg |
a function that performs stages structure estimation. Similar to
|
search_criterion |
the criterion minimized in the order search. |
lambda |
numerical value passed to |
join_unobserved |
logical, passed to |
... |
additional arguments, passed to |
Details
The greedy approach implemented in this function
iteratively adds variables to the staged tree that
better improve the search_criterion.
Value
The estimated staged event tree model.
Examples
model <- search_greedy(Titanic, alg = stages_fbhc)
print(model)
Staged event tree (sevt) class
Description
Structure and usage of S3 class sevt,
used to store a staged event tree.
Usage
sevt(x, full = FALSE, order = NULL)
## S3 method for class 'table'
sevt(x, full = FALSE, order = names(dimnames(x)))
## S3 method for class 'data.frame'
sevt(x, full = FALSE, order = colnames(x))
## S3 method for class 'list'
sevt(x, full = FALSE, order = names(x))
Arguments
x |
a list, a data frame or table object. |
full |
logical, if TRUE the full model is created otherwise the independence model. |
order |
character vector,
order of the variables to build the
tree, by default the order of the variables
in |
Details
A staged event tree object is a list with components:
tree (required): A named list with one component for each variable in the model, a character vector with the names of the levels for that variable. The order of the variables in
treeis the order of the event tree.stages (required): A named list with one component for each variable but the first, a character vector storing the stages for the situations related to path ending in that variable.
ctables: A named list with one component for each variable, the flat contingency table of that variable given the previous variables.
lambda: The smoothing parameter used to compute probabilities.
name_unobserved: The stage name for unobserved situations.
prob: The conditional probability tables for every variable and stage. Stored in a named list with one component for each variable, a list with one component for each stage.
ll: The log-likelihood of the
estimatedmodel. If present,logLik.sevtwill return this value instead of computing the log-likelihood.
The tree structure is never defined explicitly, instead it
is implicitly defined by the list tree containing the order
of the variables and the names of their levels. This is
sufficient to define a complete symmetric tree where an
internal node at a depth related to a variable v
has a number of children equal to the cardinality of
the levels of v.
The stages information is instead stored as a list of
vectors, where each vector is indexed as the internal nodes
of the tree at a given depth.
To define a staged tree from data (data frame or table) the
user can call either full or indep
which both construct the staged tree object, attach the data in
ctables and compute probabilities. After, one of the
available model selection algorithm can be used, see for example
stages_hc, stages_bhc or
stages_hclust.
If, mainly for development, only the staged tree structure is needed
(without data or probabilities) the basic
sevt constructor can
be used.
Value
A staged event tree object, an object of class sevt.
Examples
######### from table
model.titanic <- sevt(Titanic, full = TRUE)
######### from data frame
DD <- generate_random_dataset(p = 4, n = 1000)
model.indep <- sevt(DD)
model.full <- sevt(DD, full = TRUE)
######### from list
model <- sevt(list(
X = c("good", "bad"),
Y = c("high", "low")
))
Add a variable to a staged event tree
Description
Return an updated staged event tree with one additional variable at the end of the tree.
Usage
sevt_add(object, var, data, join_unobserved = TRUE, useNA = "ifany")
Arguments
object |
an object of class |
var |
character, the name of the new variable to be added. |
data |
either a |
join_unobserved |
logical, passed to |
useNA |
whether to include NA values in the tables.
Argument passed to |
Details
This function update a staged event tree object with an additional variable. The stages structure of the new variable is initialized as in the saturated model.
Value
An object of class sevt representing a
staged event tree model with var added as last variable.
Examples
model <- full(Titanic, order = c("Age", "Class"))
print(model)
model <- sevt_add(model, "Survived", Titanic)
print(model)
Number of parameters of a staged event tree
Description
Return the number of parameters of the model.
Usage
sevt_df(x)
Arguments
x |
An object of class |
Value
integer, degrees of freedom of the staged event tree.
Fit a staged event tree
Description
Estimate transition probabilities in a staged event tree from data. Probabilities are estimated with the relative frequencies plus, eventually, an additive (Laplace) smoothing.
Usage
sevt_fit(
object,
data = NULL,
lambda = NULL,
scope = NULL,
compute_logLik = TRUE
)
Arguments
object |
an object of class |
data |
data.frame or contingency table with observations of
the variables in |
lambda |
smoothing parameter or pseudocount. Default (NULL) to
lambda value stored in |
scope |
which variable should be fitted. Default (NULL) to
all variables in the model. A partial re-fit is
possible only for model which are already fitted and in
that case the provided |
compute_logLik |
logical value. If |
Details
The data in form of contingency tables and the
log-likelihood of the model is (eventually)
stored in the returned staged event tree.
Partial re-fit of a model can be performed
with the scope argument.
Partial re-fit can only be done over a
fully fitted model, e.g. when changing
the stages structure of one of the variables.
In case of a partial re-fit, the data and lambda arguments
will be ignored and the data and lambda value stored in the
sevt object will be used (a warning is issued if such arguments are
supplied).
Value
A fitted staged event tree,
that is an object of class sevt
with ctables and prob components.
Additionally the chosen lambda is stored in the returned object
and eventually the log-likelihood of the model is saved in
the ll field.
Examples
#########
model <- sevt(list(
X = c("good", "bad"),
Y = c("high", "low")
))
D <- data.frame(
X = c("good", "good", "bad"),
Y = c("high", "low", "low")
)
model.fit <- sevt_fit(model, data = D, lambda = 1)
Number of variables
Description
Utility returning the number of variables in a staged event tree model.
Usage
sevt_nvar(object)
Arguments
object |
An object of class |
Value
integer, the number of variables.
Simplify a staged tree model
Description
Function to simplify a staged tree model.
Usage
sevt_simplify(object, fit = TRUE)
Arguments
object |
an object of class |
fit |
logical, if |
Details
The simplify function will produce the corresponding simple
staged tree, that is a staged tree where stages and positions are
equivalent.
To do so the function ceg is used to compute positions, and
then the stages' vectors are replaced with the positions' vectors.
The model is the re-fitted if the input was a fitted staged tree.
Despite the name, the simplified staged tree has always a number
of stages greater or equal to the initial staged tree, thus it is
a more complex statistical model.
Value
an object of class sevt
representing the simplified model.
The returned model will be fitted if the input model was.
Examples
mod <- stages_kmeans(full(Titanic), k = 2)
simpl <- sevt_simplify(mod)
plot(simpl)
Variable names
Description
Utility returning variable-names in a staged event tree model.
Usage
sevt_varnames(object)
Arguments
object |
an object of class |
Value
A character vector.
Split randomly a stage
Description
Randomly assign some of the paths to a new stage.
Usage
split_stage_random(object, var, stage, p = 0.5)
Arguments
object |
an object of class |
var |
the variable name. |
stage |
the name of the stage. |
p |
probability to move a situation from the original stage into the new stage. |
Details
Splits randomly a given stage into two stages. More precisely,
it assigns each situation within the given stage into a new stage with
probability p.
Value
an object of class sevt.
The stages of a staged event tree
Description
Functions to get or set the stages of an object of class
sevt.
Usage
stages(object)
## S3 method for class 'sevt'
stages(object)
## S3 method for class 'sevt.stgs'
print(x, ..., max = 5)
stages(object) <- value
## S3 method for class 'sevt.stgs'
x[i, ...]
## S3 replacement method for class 'sevt.stgs'
x[i, ..., fit = TRUE] <- value
## S3 method for class 'sevt.stgs'
x[[...]]
## S3 replacement method for class 'sevt.stgs'
x[[..., fit = TRUE]] <- value
Arguments
object |
an object of class |
x |
an object of class |
... |
a path or context in the event tree. |
max |
integer, limit on the number of variables to print. |
value |
the stages replacement value. |
i |
index of variables in the tree. |
fit |
logical, if TRUE (default) the model will be re-fitted. |
Details
This functions are the preferred way to access and modify directly
the stages of an object of class sevt.
In particular the indexing and replacing methods for the
object extracted with the function stages() take care of checking
the stages sanity and refit the object probabilities when needed.
This is useful for manually setting some independence statements
(see the Examples).
Value
For stages(): returns an object of class
sevt.stgs which encode the stages of object.
Objects of class sevt.stgs have dedicated
method for sub-setting and replacing.
Stages indexing
Stages can be indexed, retrieved and replaced by the corresponding variables names and/or by paths or contexts.
In particular,
stages(object)[[var]] extracts the
stages vector corresponding to variable var (similarly
to object$stages[[var]].
Alternatively stages(object)[[path]] indexes
a stage via the corresponding path from root
(similar to get_stage); a path is
recognized as such if named or if of length > 2.
stages(object)[var, context] extracts multiple stages
corresponding to a variable and eventually filtered by
a specific context on the preceding variables.
Examples
# start with full model
mod <- full(Titanic)
# impose the context independence Survived indep Sex, Age | Class = 1st
stages(mod)["Survived", Class = "1st"] <- "C1"
# impose Survived indep Class | Class in (2nd 3rd)
stages(mod)["Survived", Class = "3rd"] <- stages(mod)["Survived", Class = "2nd"]
# impose Age indep Class | Sex
stages(mod)["Age", Sex = "Female"] <- "S-female"
stages(mod)["Age", Sex = "Male"] <- "S-male"
# stages of Survived
stages(mod)[["Survived"]]
# stages of Survived and Age
stages(mod)[c("Survived", "Age")]
# stages of Survived in the context Class 2nd or 3rd
stages(mod)["Survived", Class = c("2nd", "3rd")]
# check independencies
as_parentslist(mod)
Backward hill-climbing
Description
Greedy search of staged event trees with iterative joining of stages.
Usage
stages_bhc(
object,
score = function(x) {
return(-BIC(x))
},
max_iter = Inf,
scope = NULL,
ignore = object$name_unobserved,
trace = 0
)
Arguments
object |
an object of class |
score |
the score function to be maximized. |
max_iter |
the maximum number of iterations per variable. |
scope |
names of variables that should be considered for the optimization. |
ignore |
vector of stages which will be ignored and left untouched,
by default the name of the unobserved stages stored in
|
trace |
if >0 increasingly amount of info
is printed (via |
Details
For each variable the algorithm tries to join stages and moves to the best model that increases the score. When no increase is possible it moves to the next variable.
Value
The final staged event tree obtained.
Examples
DD <- generate_xor_dataset(p = 4, n = 100)
model <- stages_bhc(full(DD), trace = 2)
summary(model)
Backward random hill-climbing
Description
Randomly try to join stages. This is a pretty-useless function, used for comparisons.
Usage
stages_bhcr(
object,
score = function(x) {
return(-BIC(x))
},
max_iter = 100,
trace = 0
)
Arguments
object |
an object of class |
score |
the score function to be maximized. |
max_iter |
the maximum number of iteration. |
trace |
if >0 increasingly amount of info
is printed (via |
Details
At each iteration a variable and
two of its stages are randomly selected.
If joining the stages increases the score, the model is
updated. The procedure is repeated until the
number of iterations reaches max_iter.
Value
an object of class sevt.
Examples
DD <- generate_xor_dataset(p = 4, n = 100)
model <- stages_bhcr(full(DD), trace = 2)
summary(model)
Backward joining of stages
Description
Join stages from more complex to simpler models using a distance and a threshold value.
Usage
stages_bj(
object,
distance = "kullback",
thr = 0.1,
scope = NULL,
ignore = object$name_unobserved,
trace = 0
)
Arguments
object |
an object of class |
distance |
character, see details. |
thr |
the threshold for joining stages |
scope |
names of variables that should be considered for the optimization. |
ignore |
vector of stages which will be ignored and left untouched,
by default the name of the unobserved stages stored in
|
trace |
if >0 increasingly amount of info
is printed (via |
Details
For each variable in the model stages are joined iteratively.
At each iteration the two stages with minimum distance are selected and
joined if their distance is less than thr.
Available distances are: manhattan (manhattan), euclidean (euclidean),
Renyi divergence (reny), Kullback-Liebler (kullback),
total-variation (totvar), squared Hellinger (hellinger),
Bhattacharyya (bhatt), Chan-Darwiche (chandarw).
See also probdist.
Value
The final staged event tree obtained.
Examples
DD <- generate_xor_dataset(p = 5, n = 1000)
model <- stages_bj(full(DD, lambda = 1), trace = 2)
summary(model)
Context-specific Backward hill-climbing
Description
Greedy search of staged event trees with iterative joining of stages.
Usage
stages_csbhc(
object,
score = function(x) {
return(-BIC(x$ll))
},
max_iter = Inf,
scope = NULL,
ignore = object$name_unobserved
)
Arguments
object |
an object of class |
score |
the score function to be maximized. |
max_iter |
the maximum number of iterations per variable. |
scope |
names of variables that should be considered for the optimization. |
ignore |
vector of stages which will be ignored and left untouched,
by default the name of the unobserved stages stored in
|
Details
For each variable the algorithm tries to join stages , by adding context specific independences, and moves to the best model that increases the score. When no increase is possible it moves to the next variable.
Value
The final staged event tree obtained.
Examples
model <- stages_csbhc(full(Titanic))
summary(model)
Fast backward hill-climbing
Description
Greedy search of staged event trees with iterative joining of stages.
Usage
stages_fbhc(
object,
score = function(x) {
return(-BIC(x))
},
max_iter = Inf,
scope = NULL,
ignore = object$name_unobserved,
trace = 0
)
Arguments
object |
an object of class |
score |
the score function to be maximized. |
max_iter |
the maximum number of iteration. |
scope |
names of variables that should be considered for the optimization. |
ignore |
vector of stages which will be ignored and left untouched,
by default the name of the unobserved stages stored in
|
trace |
if >0 increasingly amount of info
is printed (via |
Details
For each variable the algorithm tries to join stages and moves to the first model that increases the score. When no increase is possible it moves to the next variable.
Value
The final staged event tree obtained.
Examples
DD <- generate_xor_dataset(p = 5, n = 100)
model <- stages_fbhc(full(DD), trace = 2)
summary(model)
Hill-climbing
Description
Greedy search of staged event trees with iterative moving of nodes between stages.
Usage
stages_hc(
object,
score = function(x) {
return(-BIC(x))
},
max_iter = Inf,
scope = NULL,
ignore = object$name_unobserved,
trace = 0
)
Arguments
object |
an object of class |
score |
the score function to be maximized. |
max_iter |
the maximum number of iterations per variable. |
scope |
names of variables that should be considered for the optimization |
ignore |
vector of stages which will be ignored and left untouched,
by default the name of the unobserved stages stored in
|
trace |
if >0 increasingly amount of info
is printed (via |
Details
For each variable node-moves that best increases the score are performed until no increase is possible. A node-move is either changing the stage associate to a node or move the node to a new stage.
The ignore argument can be used to specify stages that should not
be affected during the search, that is left untouched.
This is useful for preserving structural zeroes and to speed-up
computations.
Value
The final staged event tree obtained.
Examples
start <- indep(PhDArticles[, 1:5], join_unobserved = TRUE)
model <- stages_hc(start)
Learn a staged tree with hierarchical clustering
Description
Build a stage event tree with k stages for each variable by
clustering stage probabilities with hierarchical clustering.
Usage
stages_hclust(
object,
distance = "totvar",
k = NA,
method = "complete",
ignore = object$name_unobserved,
limit = length(object$tree),
scope = NULL,
score = function(x) {
return(-BIC(x))
}
)
Arguments
object |
an object of class |
distance |
character, the distance measure to be used, either
a possible |
k |
integer or (named) vector: number of clusters, that is stages per variable.
Values will be recycled if needed. If |
method |
the agglomeration method to be used in |
ignore |
vector of stages which will be ignored and left untouched.
By default the name of the unobserved stages stored in
|
limit |
the maximum number of variables to consider. |
scope |
names of the variables to consider. |
score |
A function. Score to maximize for automatic selection
of the number of stages. Used if |
Details
hclust_sevt performs hierarchical clustering
of the initial stage probabilities in object
and it aggregates them into the specified number
of stages (k).
A different number of stages for the different variables
in the model can be specified by supplying a (named) vector
via the argument k.
If k is NA for some variables, all
possible number of stages will be checked and the
one that maximize the score will be selected.
Value
A staged event tree object.
Examples
data("Titanic")
model <- stages_hclust(full(Titanic, join_unobserved = TRUE, lambda = 1), k = 2)
summary(model)
### or search k via BIC minimization
model1 <- stages_hclust(full(Titanic), k = NA)
Learn a staged tree with k-means clustering
Description
Build a stage event tree with k stages for each variable
by clustering (transformed) probabilities with k-means.
Usage
stages_kmeans(
object,
k = length(object$tree[[1]]),
algorithm = "Hartigan-Wong",
transform = sqrt,
ignore = object$name_unobserved,
limit = length(object$tree),
scope = NULL,
nstart = 1
)
Arguments
object |
an object of class |
k |
integer or (named) vector: number of clusters, that is stages per variable. Values will be recycled if needed. |
algorithm |
character: as in |
transform |
function applied to the probabilities before clustering. |
ignore |
vector of stages which will be ignored and left untouched,
by default the name of the unobserved stages stored in
|
limit |
the maximum number of variables to consider. |
scope |
names of the variables to consider. |
nstart |
as in |
Details
kmenas_sevt performs k-means clustering
to aggregate the stage probabilities of the initial
staged tree object.
Different values for k can be specified by supplying a
(named) vector to k.
kmeans from the stats package is used
internally and arguments algorithm and nstart
refer to the same arguments as kmeans.
Value
A staged event tree.
Examples
data("Titanic")
model <- stages_kmeans(full(Titanic, join_unobserved = TRUE, lambda = 1), k = 2)
summary(model)
Backward hill-climbing for simple staged trees
Description
Greedy search of simple staged event trees with iterative joining of positions.
Usage
stages_simplebhc(
object,
score = function(x) {
return(-BIC(x))
},
scope = NULL,
max_iter = Inf,
ignore = object$name_unobserved
)
Arguments
object |
an object of class |
score |
the score function to be maximized. |
scope |
names of variables that should be considered for the optimization. |
max_iter |
the maximum number of iterations per variable. |
ignore |
vector of stages which will be ignored and left untouched,
by default the name of the unobserved stages stored in
|
Details
This function is similar to the classical
backward hill-climbing implemented in stages_bhc, but
instead of joining stages it consider joining of positions via
join_positions.
Thus, the search is in the space of simple staged tree models if the
initial stage tree is simple.
See the references for additional details.
Value
an object of class sevt, the simple staged tree resulting
from the search.
References
Leonelli M, Varando G. Structural Learning of Simple Staged Trees, arXiv preprint arXiv:2203.04390v1
See Also
join_positions()
sevt_simplify()
Examples
mod <- stages_simplebhc(full(Titanic))
plot(mod)
Standard renaming of stages
Description
Rename all stages in a staged event tree.
Usage
stndnaming(
object,
uniq = FALSE,
prefix = FALSE,
ignore = object$name_unobserved
)
Arguments
object |
an object of class |
uniq |
logical, if stage numbers should be unique over all tree. |
prefix |
logical, if stage names should be prefixed with variable name. |
ignore |
vector of stages which will be ignored and left untouched,
by default the name of the unobserved stages stored in
|
Value
a staged event tree object with stages named with consecutive integers.
Examples
model <- stages_fbhc(full(PhDArticles, join_unobserved = TRUE))
model$stages
model1 <- stndnaming(model)
model1$stages
### unique stage names in all tree
model2 <- stndnaming(model, uniq = TRUE)
model2$stages
### prefix stage names with variable name
model3 <- stndnaming(model, prefix = TRUE)
model3$stages
### manuallty select stage names left untouched
model4 <- stndnaming(model, ignore = c("2", "6"), prefix = TRUE)
model4$stages
Extract subtree
Description
Extract subtree
Usage
subtree(object, path)
Arguments
object |
an object of class |
path |
the path from root after which extract the subtree. |
Details
Returns the subtree of the staged event tree, starting from
path.
Value
A staged event tree object corresponding to the subtree.
Examples
DD <- generate_random_dataset(4, 100)
model <- sevt(DD, full = TRUE)
plot(model)
model1 <- subtree(model, path = c("-1", "1"))
plot(model1)
Summarizing staged event trees
Description
Summary method for class sevt.
Usage
## S3 method for class 'sevt'
summary(object, ...)
## S3 method for class 'summary.sevt'
print(x, max = 10, ...)
Arguments
object |
an object of class |
... |
arguments for compatibility. |
x |
an object of class |
max |
the maximum number of variables for which information is printed. |
Details
Print model information and summary of stages.
Value
An object of class summary.sevt
for which a print
method exist.
Examples
model <- stages_fbhc(full(PhDArticles, lambda = 1))
summary(model)
Add text to a staged event tree plot
Description
Add text to a staged event tree plot
Usage
## S3 method for class 'sevt'
text(x, y = ylim[1], limit = 10, xlim = c(0, 1), ylim = c(0, 1), ...)
Arguments
x |
An object of class |
y |
the position of the labels. |
limit |
maximum number of variables plotted. |
xlim |
graphical parameter. |
ylim |
graphical parameter. |
... |
additional parameters passed to |
Hospital trajectories
Description
Generated dataset with observations from five variables (SEX, AGE, ICU, RSP, OUT) describing imaginary patients' trajectories in a hospital.
Usage
trajectories
Format
A data frame with 10000 observations of 5 variables.
Source
The data has been generated with the code in the Examples section.
Examples
library("stagedtrees")
tree <- list(SEX = c("male", "female"),
AGE = c("child", "adult", "elder"),
ICU = c("0", "1"),
RSP = c("intub", "mask", "no"),
OUT = c("death", "survived"))
model <- sevt(tree, full = TRUE)
stages(model)["ICU", AGE = "child"] <- "ICUchild"
stages(model)["ICU", SEX = "male", AGE = "elder"] <-
stages(model)["ICU", SEX = "female", AGE = "elder"]
stages(model)["RSP", AGE = c("child"), ICU = "0"] <- "childnoICU"
stages(model)["RSP", AGE = c("child"), ICU = "1"] <- "childICU"
stages(model)["RSP", AGE = c("adult")] <- stages(model)["RSP", AGE = c("elder")]
stages(model)["OUT", AGE = "adult",
SEX = "female",
ICU = "1",
RSP = c("intub", "mask")] <- "femaleICUresp"
stages(model)["OUT", AGE = "child",
ICU = "1",
RSP = "intub"] <- "childICUintub"
stages(model)["OUT", AGE = "child",
ICU = "1",
RSP = "mask"] <- "childICUmask"
stages(model)["OUT", AGE = "child",
ICU = "1",
RSP = "no"] <- "childICUno"
stages(model)["OUT", AGE = "adult", SEX = "male"] <-
stages(model)["OUT", AGE = "elder", SEX = "female"]
stages(model)["OUT", ICU = "0", RSP = "intub"] <- "UNOBS"
stages(model)["OUT", ICU = "0", RSP = "intub"] <- "UNOBS"
stages(model)["OUT", AGE = "child", ICU = "0"] <- "UNOBS"
model$prob <- list()
model$prob$SEX <- list( "NA" = c(male = 0.4, female = 0.6))
model$prob$AGE <- list("1" = c("child" = 0.1, "adult" = 0.5, "elder" = 0.4),
"2" = c("child" = 0.1, "adult" = 0.3, "elder" = 0.6))
model$prob$ICU <- list("ICUchild" = c("0" = 0, "1" = 1),
"2" = c("0" = 0.4, "1" = 0.6), ## male adult
"5" = c("0" = 0.2, "1" = 0.8), ## female adult
"6" = c("0" = 0.7, "1" = 0.3)) ## elder
model$prob$RSP <- list("childnoICU" = c("intub" = NA, "mask" = NA, "no" = NA),
"childICU" = c("intub" = 0.1, "mask" = 0.7, "no" = 0.2),
"5" = c("intub" = 0, "mask" = 0.7, "no" = 0.3), # male noICU
"6" = c("intub" = 0.4, "mask" = 0.5, "no" = 0.1), # male ICU
"11" = c("intub" = 0, "mask" = 0.5, "no" = 0.5), # female noICU
"12" = c("intub" = 0.4, "mask" = 0.5, "no" = 0.1)) # female ICU
model$prob$OUT <- list("UNOBS" = c("death" = NA, "survived" = NA),
"childICUintub" = c("death" = 0.03, "survived" = 0.97),
"childICUmask" = c("death" = 0.02, "survived" = 0.98),
"childICUno" = c("death" = 0.01, "survived" = 0.99),
### male adult and female elder ICU = 0 :
"32" = c("death" = 0.05, "survived" = 0.95), ## mask
"33" = c("death" = 0.01, "survived" = 0.99), ## no
### male adult and female elder ICU = 1 :
"34" = c("death" = 0.15, "survived" = 0.85), ## intub
"35" = c("death" = 0.08, "survived" = 0.92), ## mask
"36" = c("death" = 0.04, "survived" = 0.96), ## no
##############
"14" = c("death" = 0.2, "survived" = 0.8), # male elder 0 mask
"15" = c("death" = 0.1, "survived" = 0.9), # male elder 0 no
"16" = c("death" = 0.3, "survived" = 0.7), # male elder 1 intub
"17" = c("death" = 0.25, "survived" = 0.75), # male elder 1 mask
"18" = c("death" = 0.3, "survived" = 0.7), # male elder 1 no
##############
"26" = c("death" = 0.1, "survived" = 0.9), # female adult 0 mask
"27" = c("death" = 0.15, "survived" = 0.85), # female adult 0 no
"30" = c("death" = 0.2, "survived" = 0.8), # female adult 1 no
##############
"femaleICUresp" = c("death" = 0.1, "survived" = 0.9)
)
# trajectories <- sample_from(model, 10000, seed = 1)
# usethis::use_data(trajectories, overwrite = TRUE)
return path index
Description
return path index
Usage
tree_idx(path, tree, complete = FALSE)
Arguments
path |
a path from root in the tree. |
tree |
a symmetric tree given as a list of levels. |
complete |
logical, if |
Details
Compute the integer index of the node associated with the
given path in a symmetric tree defined by tree.
Value
an integer, the index of the node corresponding to path
Tree string
Description
Tree string
Usage
tree_string(tree, max)
Arguments
tree |
ordered list of variables |
Unique id from named list
Description
Unique id from named list
Usage
uni_idx(x, sep = "_")
Arguments
x |
a named list. |
Value
A named list with unique ids.
Find maximum value
Description
Find maximum value
Usage
which_class(x, levels)
Arguments
x |
numerical, the log-probabilities. |
levels |
the levels to be returned same length as x. |
Value
factor.
Export the staged tree or CEG graph to tikz
Description
Generate tikz code to draw the staged tree or CEG graph.
Usage
write_tikz(
x,
layout = NULL,
file = "",
col = NULL,
ignore = x$name_unobserved,
node_label = function(node) {
ifelse(is.na(node$stage), "", node$stage)
},
edge_label = function(edge) {
ifelse(is.na(edge$label), "", edge$label)
},
edge_label_options = function(edge) {
return("sloped")
},
scale = 10,
normalize_layout = TRUE,
node_shape = "circle",
node_inner_sep = "1mm",
node_minimum_size = "0.3cm",
node_draw_color = "black",
node_thickness = "very thick",
node_text_color = "black"
)
## S3 method for class 'sevt'
write_tikz(
x,
layout = NULL,
file = "",
col = NULL,
ignore = x$name_unobserved,
node_label = function(node) {
ifelse(is.na(node$stage), "", node$stage)
},
edge_label = function(edge) {
ifelse(is.na(edge$label), "", edge$label)
},
edge_label_options = function(edge) {
return("sloped")
},
scale = 10,
normalize_layout = TRUE,
node_shape = "circle",
node_inner_sep = "1mm",
node_minimum_size = "0.3cm",
node_draw_color = "black",
node_thickness = "very thick",
node_text_color = "black"
)
Arguments
x |
|
layout |
the layout of the graph, given as matrix with two columns and as many rows as nodes in the staged tree. By default, a modified sugiyama layout is used. The layout matrix can be obtained with igraph layout functions. |
file |
A connection or a character string naming the file to print to.
Passed to |
col |
color specifications for the stages of the staged even tree.
Same as |
ignore |
vector of stages which will be ignored and not plotted,
by default the name of the unobserved stages stored in |
node_label |
a function that produces nodes labels. |
edge_label |
a function that produces edge labels. |
edge_label_options |
a function that produces edge label options. |
scale |
for the tikzfigure. |
normalize_layout |
a logical value. If |
node_shape |
the shape to be used for nodes. |
node_inner_sep |
the |
node_minimum_size |
the |
node_draw_color |
the color for line drawing the nodes. |
node_thickness |
the thickness of the lines. |
node_text_color |
the color for label in nodes. |
Details
This function can be used to create a working
tikz code that compile to a graph similar to the
one obtained by plot.sevt(x, ...) or
plot.ceg(x, ...).
References
Code partially inspired by the code in Exporting graphs to LaTeX, using igraph and TikZ http://igraph.wikidot.com/r-recipes#toc2