There are two steps necessary for summarizing tree samples. One is to
decide how many distinct topologies (disregarding branch lengths) are in
the collection of trees, and then we need to summarize branch lengths
for each of these tree subsets using any sensible summary measure (e.g.,
mean or median). The functions topoFreq and
summaryBrlen aid in these two tasks respectively.
topoFreq uses the Robinson-Foulds (RF) distance among trees
in order to find clusters which are identical in topology and then bind
them together in a list that we can later operate over. The RF distance
is defined over unrooted trees, so it is necessary to check whether the
input trees are. Unrooting a rooted tree will destroy the root node and
then add up together both adjacent branch lengths, and this will be
reflected when summarizing withsummaryBrlens using the
output of topoFreq. This will return one branch length less
than the number present in the original sample of rooted trees.
Accordingly, this method will only work on collections of unrooted
trees, or collections of time trees of fixed topology where we are only
interested in finding a summary values for branch lengths. It should not
be used when both the topology and divergence times are being
coestimated (e.g., when the posterior of trees comes from not fixing the
topology as in Beast2, RevBayes, and
MrBayes), but it is perfectly fine when the inference was
done to estimate only the unrooted topology and branch lengths, such as
in standard tree inference in MrBayes.
We see an example using these two functions below where we want to use the median branch length for each group:
# load the packages
library(ape)
library(tbea)
# these urls render correctly on html but not on pdf. The former is more important for CRAN.
# download the trees from the supplementary material of Ballen et al. (2022)
t1<-"https://raw.githubusercontent.com/gaballench/cynodontidaeWare/refs/heads/main/bayesian/mrbayes"
t2<-"https://raw.githubusercontent.com/gaballench/cynodontidaeWare/refs/heads/main/bayesian/mrbayes"
mtr1 <- read.nexus(paste(t1, "concatenatedMolmorph.nexus.run1.t", sep="/"))
mtr2 <- read.nexus(paste(t2, "concatenatedMolmorph.nexus.run2.t", sep="/"))
trees <- c(mtr1, mtr2)
# calculate topological frequencies
tpf <- topoFreq(trees, output="trees")
# summarize median branch length
sumtrees <- summaryBrlen(tpf$trees, method="median")
# sort the frequencies in decreasing order
decreasingIdx <- order(tpf$fabs, decreasing=TRUE)
# how many topologies comprise about 90% of the trees?
sum(cumsum(tpf$fabs[decreasingIdx])/sum(tpf$fabs)<0.9)## [1] 4# plot these four trees with branch lengths already summarized
par(mfrow=c(2,2))
par(oma=c(0, 0, 0, 0))
par(mai=c(0.5, 0.2, 0.5, 0.2))
for(i in 1:4){
    plot(sumtrees[[decreasingIdx[i]]],
         type="unrooted",
         show.node.label=FALSE,
         cex=0.4,
         main=round(tpf$frel[decreasingIdx[i]], digits=2))
}