Title: | Phylogenetic Tree Statistics |
---|---|
Description: | Collection of phylogenetic tree statistics, collected throughout the literature. All functions have been written to maximize computation speed. The package includes umbrella functions to calculate all statistics, all balance associated statistics, or all branching time related statistics. Furthermore, the 'treestats' package supports summary statistic calculations on Ltables, provides speed-improved coding of branching times, Ltable conversion and includes algorithms to create intermediately balanced trees. Full description can be found in Janzen (2024) <doi:10.1016/j.ympev.2024.108168>. |
Authors: | Thijs Janzen [cre, aut] |
Maintainer: | Thijs Janzen <[email protected]> |
License: | GPL-3 |
Version: | 1.70.5 |
Built: | 2024-11-09 05:27:18 UTC |
Source: | https://github.com/thijsjanzen/treestats |
The 'treestats' package contains a collection of phylogenetic tree statistics, implemented in C++ to ensure high speed.
Given a phylogenetic tree as a phylo object, the 'treestats' package provides a wide range of individual functions returning the relevant statistic. In addition, there are three functions available that calculate a collection of statistics at once: calc_all_statistics (which calculates all currently implemented statistics of treestats), calc_balance_stats, which calculates all (im)balance related statistics and calc_brts_stats, which calculates all branching times and branch length related statistics. Furthermore, there are a number of additional tools available that allow for phylogenetic tree manipulation: make_unbalanced_tree, which creates an imbalanced tree in a stepwise fashion. Then there are two functions related to conversion from and to an ltable, an alternative notation method used in some simulations. These are l_to_phylo which is a C++ based version of DDD::L2phylo, which converts an ltable to a phylo object, and phylo_to_l, which is a C+ based version of DDD::phylo2L, which converts a phylo object to an ltable. Lastly, the treestats package also includes a faster, C++ based, implementation of ape::branching.times (the function branching_times), which yields the same sequence of branching times, but omits the branching names in favour of speed.
Maintainer: Thijs Janzen <[email protected]>
Phylogenetic tree statistics: a systematic overview using the new R package 'treestats' Thijs Janzen, Rampal S. Etienne bioRxiv 2024.01.24.576848; doi: https://doi.org/10.1101/2024.01.24.576848
The area per pair index calculates the sum of the number of
edges on the path between all two leaves. Instead, the area per pair index
(APP) can also be derived from the Sackin (S) and total cophenetic index
(TC):
area_per_pair(phy, normalization = "none")
area_per_pair(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
"none" or "yule", in which case the acquired result is divided by the expectation for the Yule model. |
Area per pair index
T. Araújo Lima, F. M. D. Marquitti, and M. A. M. de Aguiar. Measuring Tree Balance with Normalized Tree Area. arXiv e-prints, art. arXiv:2008.12867, 2020.
Average leaf depth statistic. The average leaf depth statistic is a normalized version of the Sackin index, normalized by the number of tips.
average_leaf_depth(phy, normalization = "none")
average_leaf_depth(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
"none" or "yule", in which case the statistic is divided by the expectation under the yule model, following Remark 1 in Coronado et al. 2020. |
average leaf depth statistic
M. Coronado, T., Mir, A., Rosselló, F. et al. On Sackin’s original proposal: the variance of the leaves’ depths as a phylogenetic balance index. BMC Bioinformatics 21, 154 (2020). https://doi.org/10.1186/s12859-020-3405-1 K.-T. Shao and R. R. Sokal. Tree balance. Systematic Zoology, 39(3):266, 1990. doi: 10.2307/2992186.
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) average_leaf_depth(simulated_tree)
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) average_leaf_depth(simulated_tree)
Calculate the avgLadder index, from the phyloTop package. Higher values indicate more unbalanced trees. To calculate the average ladder index, first all potential ladders in the tree are calculated. A ladder is defined as a sequence of nodes where one of the daughter branches is a terminal branch, resulting in a 'ladder' like pattern. The average ladder index then represents the average lenght across all observed ladders in the tree.
avg_ladder(input_obj)
avg_ladder(input_obj)
input_obj |
phylo object or ltable |
average number of ladders
The average vertex depth metric, measures the average path (in edges), between the tips and the root.
avg_vert_depth(phy)
avg_vert_depth(phy)
phy |
phylo object or ltable |
Average depth (in number of edges)
C. Colijn and J. Gardy. Phylogenetic tree shapes resolve disease transmission patterns. Evolution, Medicine, and Public Health, 2014(1):96-108, 2014. ISSN 2050-6201. doi: 10.1093/emph/eou018.
Balance metric (in the case of a binary tree), which measures the sum across all internal nodes of one over the maximum depth of all attached tips to that node. Although also defined on non-binary trees, the treestats package only provides code for binary trees.
b1(phy, normalization = "none")
b1(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
"none" or "tips", in which case the resulting statistic is divided by the number of tips in the tree, as a crude way of normalization. |
B1 statistic
K.-T. Shao and R. R. Sokal. Tree Balance. Systematic Zoology, 39(3):266, 1990. doi: 10.2307/2992186.
Balance metric that uses the Shannon-Wiener statistic of information content. The b2 measure is given by the sum over the depths of all tips, divided by 2^depth: sum Ni / 2^Ni. Although also defined on non-binary trees, the treestats package only provides code for binary trees.
b2(phy, normalization = "none")
b2(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
"none" or "yule", when "yule" is chosen, the statistic is divided by the Yule expectation, following from theorem 3.7 in Bienvenu 2020. |
Maximum depth (in number of edges)
K.-T. Shao and R. R. Sokal. Tree Balance. Systematic Zoology, 39(3):266, 1990. doi: 10.2307/2992186.
Bienvenu, François, Gabriel Cardona, and Celine Scornavacca. "Revisiting Shao and Sokal’s $$ B_2 $$ B 2 index of phylogenetic balance." Journal of Mathematical Biology 83.5 (2021): 1-43.
The Beta statistic fits a beta splitting model to each node,
assuming that the number of extant descendents of each daughter branch is
split following a beta distribution, such that the number of extant
descendentants x and y at a node follows , where
is a normalizing constant. When this model is fit to a
tree, different values of beta correspond to the expectation following from
different diversification models, such that a beta of 0 corresponds to a
Yule tree, a beta of -3/2 to a tree following from a PDA model. In general,
negative beta values correspond to trees more unbalanced than Yule trees, and
beta values larger than zero indicate trees more balanced than Yule trees.
The lower bound of the beta splitting parameter is -2.
beta_statistic( phy, upper_lim = 10, algorithm = "COBYLA", abs_tol = 1e-04, rel_tol = 1e-06 )
beta_statistic( phy, upper_lim = 10, algorithm = "COBYLA", abs_tol = 1e-04, rel_tol = 1e-06 )
phy |
phylogeny or ltable |
upper_lim |
Upper limit for beta parameter, default = 10. |
algorithm |
optimization algorithm used, default is "COBYLA" (Constrained Optimization BY Linear Approximations), also available are "subplex" and "simplex". Subplex and Simplex seem to have difficulties with unbalanced trees, e.g. if beta < 0. |
abs_tol |
absolute stopping criterion of optimization. Default is 1e-4. |
rel_tol |
relative stopping criterion of optimization. Default is 1e-6. |
Beta value
Aldous, David. "Probability distributions on cladograms." Random discrete structures. Springer, New York, NY, 1996. 1-18. Jones, Graham R. "Tree models for macroevolution and phylogenetic analysis." Systematic biology 60.6 (2011): 735-746.
simulated_tree <- ape::rphylo(n = 100, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) beta_statistic(balanced_tree) # should be approximately 10 beta_statistic(simulated_tree) # should be near 0 beta_statistic(unbalanced_tree) # should be approximately -2
simulated_tree <- ape::rphylo(n = 100, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) beta_statistic(balanced_tree) # should be approximately 10 beta_statistic(simulated_tree) # should be near 0 beta_statistic(unbalanced_tree) # should be approximately -2
The Blum index of imbalance (also known as the s-shape
statistic) calculates the sum of over all internal nodes,
where N represents the total number of extant tips connected to that node.
An alternative implementation can be found in the Castor R package.
blum(phy, normalization = FALSE)
blum(phy, normalization = FALSE)
phy |
phylogeny or ltable |
normalization |
because the Blum index sums over all nodes, the resulting statistic tends to be correlated with the number of extant tips. Normalization can be performed by dividing by the number of extant tips. |
Blum index of imbalance
M. G. B. Blum and O. Francois (2006). Which random processes describe the Tree of Life? A large-scale study of phylogenetic tree imbalance. Systematic Biology. 55:685-691.
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) blum(balanced_tree) blum(unbalanced_tree) # should be higher
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) blum(balanced_tree) blum(unbalanced_tree) # should be higher
C++ based alternative to 'ape::branching.times', please note that to maximise speed, 'treestats::branching_times' does not return node names associated to the branching times, in contrast to the ape version.
branching_times(phy)
branching_times(phy)
phy |
phylo object or ltable |
vector of branching times
this function applies all tree statistics available in this package to a single tree, being:
gamma
Sackin
Colless
corrected Colless
quadratic Colless
Aldous' beta statistic
Blum
crown age
tree height
Pigot's rho
number of lineages
nLTT with empty tree
phylogenetic diversity
avgLadder index
cherries
double cherries
ILnumber
pitchforks
stairs
stairs2
laplacian spectrum
B1
B2
area per pair (aPP)
average leaf depth (aLD)
I statistic
ewColless
max Delta Width (maxDelW)
maximum of Depth
variance of Depth
maximum Width
Rogers
total Cophenetic distance
symmetry Nodes
mean of pairwise distance (mpd)
variance of pairwise distance (vpd)
Phylogenetic Species Variability (psv)
mean nearest taxon distance (mntd)
J statistic of entropy
rquartet index
Wiener index
max betweenness
max closeness
diameter, without branch lenghts
maximum eigen vector value
mean branch length
variance of branch length
mean external branch length
variance of external branch length
mean internal branch length
variance of internal branch length
number of imbalancing steps
j_one statistic
treeness statistic
For the Laplacian spectrum properties, four properties of the eigenvalue distribution are returned: 1) asymmetry, 2) peakedness, 3) log(principal eigenvalue) and 4) eigengap. Please notice that for some very small or very large trees, some of the statistics can not be calculated. The function will report an NA for this statistic, but will not break, to facilitate batch analysis of large numbers of trees.
calc_all_stats(phylo, normalize = FALSE)
calc_all_stats(phylo, normalize = FALSE)
phylo |
phylo object |
normalize |
if set to TRUE, results are normalized (if possible) under either the Yule expectation (if available), or the number of tips |
List with statistics
this function applies all tree statistics based on branching times to a single tree (more or less ignoring topology), being:
gamma
pigot's rho
mean branch length
nLTT with empty tree
treeness
var branch length
mean internal branch length
mean external branch length
var internal branch length
var external branch length
calc_brts_stats(phylo)
calc_brts_stats(phylo)
phylo |
phylo object |
list with statistics
this function calculates all tree statistics based on topology available in this package for a single tree, being:
area_per_pair
average_leaf_depth
avg_ladder
avg_vert_depth
b1
b2
beta
blum
cherries
colless
colless_corr
colless_quad
diameter
double_cherries
eigen_centrality
ew_colless
four_prong
i_stat
il_number
imbalance_steps
j_one
max_betweenness
max_closeness
max_del_width
max_depth
max_ladder
max_width
mw_over_md
pitchforks
rogers
root_imbalance
rquartet
sackin
stairs
stairs2
symmetry_nodes
tot_coph
tot_internal_path
tot_path_length
var_depth
calc_topology_stats(phylo, normalize = FALSE)
calc_topology_stats(phylo, normalize = FALSE)
phylo |
phylo object |
normalize |
if set to TRUE, results are normalized (if possible) under either the Yule expectation (if available), or the number of tips |
list with statistics
Calculate the number of cherries, from the phyloTop package. A cherry is a pair of sister tips.
cherries(input_obj, normalization = "none")
cherries(input_obj, normalization = "none")
input_obj |
phylo object or ltable |
normalization |
"none", "yule", or "pda", the found number of cherries is divided by the expected number, following McKenzie & Steel 2000. |
number of cherries
McKenzie, Andy, and Mike Steel. "Distributions of cherries for two models of trees." Mathematical biosciences 164.1 (2000): 81-92.
The Colless index is calculated as the sum of
over all nodes, where L (or R) is the number of extant tips
associated with the L (or R) daughter branch at that node. Higher values
indicate higher imbalance. Two normalizations are available,
where a correction is made for tree size, under either a yule expectation,
or a pda expectation.
colless(phy, normalization = "none")
colless(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
A character string equals to "none" (default) for no normalization or one of "pda" or "yule". |
colless index
Colless D H. 1982. Review of: Phylogenetics: The Theory and Practice of Phylogenetic Systematics. Systematic Zoology 31:100-104.
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) colless(balanced_tree) colless(unbalanced_tree) # should be higher
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) colless(balanced_tree) colless(unbalanced_tree) # should be higher
The Corrected Colless index is calculated as the sum of
over all nodes, corrected for tree size by dividing over
(n-1) * (n-2), where n is the number of nodes.
colless_corr(phy, normalization = "none")
colless_corr(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
A character string equals to "none" (default) for no normalization or "yule", in which case the obtained index is divided by the Yule expectation. |
corrected colless index
Heard, Stephen B. "Patterns in tree balance among cladistic, phenetic, and randomly generated phylogenetic trees." Evolution 46.6 (1992): 1818-1826.
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) colless_corr(balanced_tree) colless_corr(unbalanced_tree) # should be higher
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) colless_corr(balanced_tree) colless_corr(unbalanced_tree) # should be higher
The Quadratic Colless index is calculated as the sum of
over all nodes.
colless_quad(phy, normalization = "none")
colless_quad(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
A character string equals to "none" (default) for no normalization or "yule" |
quadratic colless index
Bartoszek, Krzysztof, et al. "Squaring within the Colless index yields a better balance index." Mathematical Biosciences 331 (2021): 108503.
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) colless_quad(balanced_tree) colless_quad(unbalanced_tree) # should be higher
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) colless_quad(balanced_tree) colless_quad(unbalanced_tree) # should be higher
This function takes an input phylogeny, and returns a phylogeny that is most ideally balanced tree, whilst having the same branching times as the original input tree. Please note that if the number of tips is not even or not a power of two, the tree may not have perfect balance, but the most ideal balance possible.
create_fully_balanced_tree(phy)
create_fully_balanced_tree(phy)
phy |
phylo object |
phylo phylo object
phy <- ape::rphylo(n = 16, birth = 1, death = 0) bal_tree <- treestats::create_fully_balanced_tree(phy) treestats::colless(phy) treestats::colless(bal_tree) # much lower
phy <- ape::rphylo(n = 16, birth = 1, death = 0) bal_tree <- treestats::create_fully_balanced_tree(phy) treestats::colless(phy) treestats::colless(bal_tree) # much lower
This function takes an input phylogeny, and returns a phylogeny that is a perfectly imbalanced tree (e.g. a full caterpillar tree), that has the same branching times as the original input tree.
create_fully_unbalanced_tree(phy)
create_fully_unbalanced_tree(phy)
phy |
phylo object |
phylo phylo object
phy <- ape::rphylo(n = 16, birth = 1, death = 0) bal_tree <- treestats::create_fully_unbalanced_tree(phy) treestats::colless(phy) treestats::colless(bal_tree) # much higher
phy <- ape::rphylo(n = 16, birth = 1, death = 0) bal_tree <- treestats::create_fully_unbalanced_tree(phy) treestats::colless(phy) treestats::colless(bal_tree) # much higher
In a reconstructed tree, obtaining the crown age is fairly straightforward, and the function beautier::get_crown_age does a great job at it. However, in a non-ultrametric tree, that function no longer works. This function provides a functioning alternative
crown_age(phy)
crown_age(phy)
phy |
phylo object or ltable |
crown age
The Diameter of a tree is defined as the maximum length of a shortest path. When taking branch lengths into account, this is equal to twice the crown age.
diameter(phy, weight = FALSE)
diameter(phy, weight = FALSE)
phy |
phylo object or ltable |
weight |
if TRUE, uses branch lengths. |
Diameter
Chindelevitch, Leonid, et al. "Network science inspires novel tree shape statistics." PloS one 16.12 (2021): e0259877.
Calculate the number of double cherries, where a single cherry is a node connected to two tips, and a double cherry is a node connected to two cherry nodes.
double_cherries(input_obj)
double_cherries(input_obj)
input_obj |
phylo object or ltable |
number of cherries
Chindelevitch, Leonid, et al. "Network science inspires novel tree shape statistics." PloS one 16.12 (2021): e0259877.
Eigen vector centrality associates with each node v the positive
value e(v), such that: . Thus,
e(v) is the Perron-Frobenius eigenvector of the adjacency matrix of the tree.
eigen_centrality(phy, weight = TRUE, scale = FALSE)
eigen_centrality(phy, weight = TRUE, scale = FALSE)
phy |
phylo object or ltable |
weight |
if TRUE, uses branch lengths. |
scale |
if TRUE, the eigenvector is rescaled |
List with the eigen vector and the leading eigen value
Chindelevitch, Leonid, et al. "Network science inspires novel tree shape statistics." Plos one 16.12 (2021): e0259877.
The intensive quadratic entropy statistic J is given by the average distance between two randomly chosen species, thus given by the sum of all pairwise distances, divided by S^2, where S is the number of tips of the tree.
entropy_j(phy)
entropy_j(phy)
phy |
phylo object or ltable |
intensive quadratic entropy statistic J
Izsák, János, and Laszlo Papp. "A link between ecological diversity indices and measures of biodiversity." Ecological Modelling 130.1-3 (2000): 151-156.
The equal weights Colless index is calculated as the sum of
over all nodes where L + R > 2,
where L (or R) is the number of extant tips associated with the L (or R)
daughter branch at that node. Maximal imbalance is associated with a value
of 1.0. The ew_colless index is not sensitive to tree size.
ew_colless(phy)
ew_colless(phy)
phy |
phylo object or ltable |
colless index
A. O. Mooers and S. B. Heard. Inferring Evolutionary Process from Phylogenetic Tree Shape. The Quarterly Review of Biology, 72(1), 1997. doi: 10.1086/419657.
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) ew_colless(balanced_tree) ew_colless(unbalanced_tree) # should be higher
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) ew_colless(balanced_tree) ew_colless(unbalanced_tree) # should be higher
Calculate the number of 4-tip caterpillars.
four_prong(input_obj)
four_prong(input_obj)
input_obj |
phylo object or ltable |
number of 4-tip caterpillars
Chindelevitch, Leonid, et al. "Network science inspires novel tree shape statistics." PloS one 16.12 (2021): e0259877. Rosenberg, Noah A. "The mean and variance of the numbers of r-pronged nodes and r-caterpillars in Yule-generated genealogical trees." Annals of Combinatorics 10 (2006): 129-146.
The gamma statistic measures the relative position of internal nodes within a reconstructed phylogeny. Under the Yule process, the gamma values of a reconstructed tree follow a standard normal distribution. If gamma > 0, the nodes are located more towards the tips of the tree, and if gamma < 0, the nodes are located more towards the root of the tree. Only available for ultrametric trees.
gamma_statistic(phy)
gamma_statistic(phy)
phy |
phylo object or ltable |
gamma statistic
Pybus, O. G. and Harvey, P. H. (2000) Testing macro-evolutionary models using incomplete molecular phylogenies. Proceedings of the Royal Society of London. Series B. Biological Sciences, 267, 2267–2272.
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) gamma_statistic(simulated_tree) # should be around 0. if (requireNamespace("DDD")) { ddd_tree <- DDD::dd_sim(pars = c(1, 0, 10), age = 7)$tes gamma_statistic(ddd_tree) # because of diversity dependence, should be < 0 }
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) gamma_statistic(simulated_tree) # should be around 0. if (requireNamespace("DDD")) { ddd_tree <- DDD::dd_sim(pars = c(1, 0, 10), age = 7)$tes gamma_statistic(ddd_tree) # because of diversity dependence, should be < 0 }
The ILnumber is the number of internal nodes with a single tip child. Higher values typically indicate a tree that is more unbalanced.
The ILnumber is the number of internal nodes with a single tip child, as adapted from the phyloTop package.
ILnumber(input_obj, normalization = "none")
ILnumber(input_obj, normalization = "none")
input_obj |
phylo object or ltable |
normalization |
"none" or "tips", in which case the result is normalized by dividing by N - 2, where N is the number of tips. |
ILnumber
Calculates the number of moves required to transform the focal tree into a fully imbalanced (caterpillar) tree. Higher value indicates a more balanced tree.
imbalance_steps(input_obj, normalization = FALSE)
imbalance_steps(input_obj, normalization = FALSE)
input_obj |
phylo object or ltable |
normalization |
if true, the number of steps taken is normalized by tree size, by dividing by the maximum number of moves required to move from a fully balanced to a fully imbalanced tree, which is N - log2(N) - 1, where N is the number of extant tips. |
required number of moves
The J^1 index calculates the Shannon Entropy of a tree, where at each node with two children, the Shannon Entropy is the sum of p_i log_2(p_i) over the two children i, and p_i is L / (L + R), where L and R represent the number of tips connected to the two daughter branches.
j_one(input_obj)
j_one(input_obj)
input_obj |
phylo object or ltable |
j^1 index
Jeanne Lemant, Cécile Le Sueur, Veselin Manojlović, Robert Noble, Robust, Universal Tree Balance Indices, Systematic Biology, Volume 71, Issue 5, September 2022, Pages 1210–1224, https://doi.org/10.1093/sysbio/syac027
Convert an L table to phylo object
l_to_phylo(ltab, drop_extinct = TRUE)
l_to_phylo(ltab, drop_extinct = TRUE)
ltab |
ltable |
drop_extinct |
should extinct species be dropped from the phylogeny? |
phylo object
Computes the distribution of eigenvalues for the modified graph Laplacian of a phylogenetic tree, and several summary statistics of this distribution. The modified graph Laplacian of a phylogeny is given by the difference between its' distance matrix (e.g. all pairwise distances between all nodes), and the degree matrix (e.g. the diagonal matrix where each diagonal element represents the sum of branch lengths to all other nodes). Each row of the modified graph Laplacian sums to zero. For a tree with n tips, there are N = 2n-1 nodes, and hence the modified graph Laplacian is represented by a N x N matrix. Where RPANDA relies on the package igraph to calculate the modified graph Laplacian, the treestats package uses C++ to directly calculate the different entries in the matrix. This makes the treestats implementation slightly faster, although the bulk of computation occurs in estimating the eigen values, using the function eigen from base.
laplacian_spectrum(phy)
laplacian_spectrum(phy)
phy |
phy |
list with five components: 1) eigenvalues the vector of eigen values, 2) principal_eigenvalue the largest eigenvalueof the spectral density distribution 3) asymmetry the skewness of the spectral density distribution 4) peak_height the largest y-axis valueof the spectral density distribution and 5) eigengap theposition ofthe largest difference between eigenvalues, giving the number of modalities in the tree.
Eric Lewitus, Helene Morlon, Characterizing and Comparing Phylogenies from their Laplacian Spectrum, Systematic Biology, Volume 65, Issue 3, May 2016, Pages 495–507, https://doi.org/10.1093/sysbio/syv116
Provides a list of all available statistics in the package
list_statistics(only_balance_stats = FALSE)
list_statistics(only_balance_stats = FALSE)
only_balance_stats |
only return those statistics associated with measuring balance of a tree |
vector with names of summary statistics
Convert an L table to newick string
ltable_to_newick(ltab, drop_extinct = TRUE)
ltable_to_newick(ltab, drop_extinct = TRUE)
ltab |
ltable |
drop_extinct |
should extinct species be dropped from the phylogeny? |
phylo object
the goal of this function is to increasingly imbalance a tree, by changing the topology, one move at a time. It does so by re-attaching terminal branches to the root lineage, through the ltable. In effect, this causes the tree to become increasingly caterpillarlike. When started with a balanced tree, this allows for exploring the gradient between a fully balanced tree, and a fully unbalanced tree. Please note that the algorithm will try to increase imbalance, until a fully caterpillar like tree is reached, which may occur before unbal_steps is reached. Three methods are available: "youngest", reattaches branches in order of age, starting with the branch originating from the most recent branching event and working itself through the tree. "Random" picks a random branch to reattach. "Terminal" also picks a random branch, but only from terminal branches (e.g. branches that don't have any daughter lineages, which is maximized in a fully imbalanced tree).
make_unbalanced_tree( init_tree, unbal_steps, group_method = "any", selection_method = "random" )
make_unbalanced_tree( init_tree, unbal_steps, group_method = "any", selection_method = "random" )
init_tree |
starting tree to work with |
unbal_steps |
number of imbalance generating steps |
group_method |
choice of "any" and "terminal" |
selection_method |
choice of "random", "youngest" and "oldest" |
phylo object
simulated_tree <- ape::rphylo(n = 16, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) intermediate_tree <- make_unbalanced_tree(balanced_tree, 8) colless(balanced_tree) colless(intermediate_tree) # should be intermediate value colless(unbalanced_tree) # should be highest colless value
simulated_tree <- ape::rphylo(n = 16, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) intermediate_tree <- make_unbalanced_tree(balanced_tree, 8) colless(balanced_tree) colless(intermediate_tree) # should be intermediate value colless(unbalanced_tree) # should be highest colless value
Betweenness centrality associates with each node v, the two nodes u, w, for which the shortest path between u and w runs through v, if the tree were re-rooted at node v. Then, we report the node with maximum betweenness centrality.
max_betweenness(phy, normalization = "none")
max_betweenness(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
"none" or "tips", if tips is chosen, the obtained maximum betweenness is normalized by the total amount of node pair combinations considered, e.g. (n-2)*(n-1), where n is the number of tips. |
Maximum Betweenness
Chindelevitch, Leonid, et al. "Network science inspires novel tree shape statistics." Plos one 16.12 (2021): e0259877.
Closeness is defined as 1 / Farness, where Farness is the sum of distances from a node to all the other nodes in the tree. Here, we return the node with maximum closeness.
max_closeness(phy, weight = TRUE, normalization = "none")
max_closeness(phy, weight = TRUE, normalization = "none")
phy |
phylo object or ltable |
weight |
if TRUE, uses branch lengths. |
normalization |
"none" or "tips", in which case an arbitrary post-hoc correction is performed by dividing by the expectation of n log(n), where n is the number of tips. |
Maximum Closeness
Chindelevitch, Leonid, et al. "Network science inspires novel tree shape statistics." Plos one 16.12 (2021): e0259877. Wang W, Tang CY. Distributed computation of classic and exponential closeness on tree graphs. Proceedings of the American Control Conference. IEEE; 2014. p. 2090–2095.
Calculates the maximum difference of widths of a phylogenetic tree. First, the widths are calculated by collecting the depth of each node and tip across the entire tree, where the depth represents the distance (in nodes) to the root. Then, the width represents the number of occurrences of each possible depth. Then, we take the difference between each consecutive width, starting with the first width. The maximum difference is then returned - whereas the original statistic designed by Colijn and Gardy used the absolute maximum difference, we here use the modified version as introduced in Fischher 2023: this returns the maximum value, without absoluting negative widths. This ensures that this metric is a proper (im)balance metric, follwing Fischer 2023.
max_del_width(phy, normalization = "none")
max_del_width(phy, normalization = "none")
phy |
phylogeny or ltable |
normalization |
"none" or "tips", in which case the resulting statistic is divided by the number of tips in the tree. |
maximum difference of widths
C. Colijn and J. Gardy. Phylogenetic tree shapes resolve disease transmission patterns. Evolution, Medicine, and Public Health, 2014(1):96-108, 2014. ISSN 2050-6201. doi: 10.1093/emph/eou018.. Fischer, M., Herbst, L., Kersting, S., Kühn, A. L., & Wicke, K. (2023). Tree Balance Indices: A Comprehensive Survey.
The maximum depth metric, measures the maximal path (in edges), between the tips and the root.
max_depth(phy, normalization = "none")
max_depth(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
"none" or "tips", in which case the resulting statistic is divided by the number of tips in the tree. |
Maximum depth (in number of edges)
C. Colijn and J. Gardy. Phylogenetic tree shapes resolve disease transmission patterns. Evolution, Medicine, and Public Health, 2014(1):96-108, 2014. ISSN 2050-6201. doi: 10.1093/emph/eou018.
Calculate the maximum ladder index, from the phyloTop package. Higher values indicate more unbalanced trees. To calculate the maximum ladder index, first all potential ladders in the tree are calculated. A ladder is defined as a sequence of nodes where one of the daughter branches is a terminal branch, resulting in a 'ladder' like pattern. The maximum ladder index then represents the longest ladder found among all observed ladders in the tree.
max_ladder(input_obj)
max_ladder(input_obj)
input_obj |
phylo object or ltable |
longest ladder in the tree
Calculates the maximum width, this is calculated by first collecting the depth of each node and tip across the entire tree, where the depth represents the distance (in nodes) to the root. Then, the width represents the number of occurrences of each possible depth. The maximal width then returns the maximum number of such occurences.
max_width(phy, normalization = "none")
max_width(phy, normalization = "none")
phy |
phylogeny or ltable |
normalization |
"none" or "tips", in which case the resulting statistic is divided by the number of tips in the tree. |
maximum width
C. Colijn and J. Gardy. Phylogenetic tree shapes resolve disease transmission patterns. Evolution, Medicine, and Public Health, 2014(1):96-108, 2014. ISSN 2050-6201. doi: 10.1093/emph/eou018.
Mean branch length of a tree, including extinct branches.
mean_branch_length(phy)
mean_branch_length(phy)
phy |
phylo object or Ltable |
mean branch length
Mean length of external branch lengths of a tree, e.g. of branches leading to a tip.
mean_branch_length_ext(phy)
mean_branch_length_ext(phy)
phy |
phylo object or Ltable |
mean of external branch lengths
Mean length of internal branches of a tree, e.g. of branches not leading to a tip.
mean_branch_length_int(phy)
mean_branch_length_int(phy)
phy |
phylo object or Ltable |
mean of internal branch lengths
The mean I value is defined for all nodes with at least 4 tips connected, such that different topologies can be formed. Then, for each node, I = (nm - nt/2) / (nt - 1 - nt/2), where nt is the total number of tips descending from that node, nm is the daughter branch leading to most tips, and nt/2 is the minimum size of the maximum branch, rounded up. Following Purvis et al 2002, we perform a correction on I, where we correct I for odd nt, such that I' = I * (nt - 1) / nt. This correction ensures that I is independent of nt. We report the mean value across all I' (again, following Purvis et al. 2002).
mean_i(phy)
mean_i(phy)
phy |
phylo object or ltable |
average I value across all nodes
G. Fusco and Q. C. Cronk. A new method for evaluating the shape of large phylogenies. Journal of Theoretical Biology, 1995. doi: 10.1006/jtbi.1995.0136. A. Purvis, A. Katzourakis, and P.-M. Agapow. Evaluating Phylogenetic Tree Shape: Two Modifications to Fusco & Cronks Method. Journal of Theoretical Biology, 2002. doi: 10.1006/jtbi.2001.2443.
Fast function using C++ to calculate the mean pairwise distance, using the fast algorithm by Constantinos, Sandel & Cheliotis (2012).
mean_pair_dist(phy, normalization = "none")
mean_pair_dist(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
"none" or "tips", in which case the obtained mean pairwise distance is normalized by the factor 2log(n), where n is the number of tips. |
Mean pairwise distance
Webb, C., D. Ackerly, M. McPeek, and M. Donoghue. 2002. Phylogenies and community ecology. Annual Review of Ecology and Systematics 33:475-505.
Tsirogiannis, Constantinos, Brody Sandel, and Dimitris Cheliotis. "Efficient computation of popular phylogenetic tree measures." Algorithms in Bioinformatics: 12th International Workshop, WABI 2012, Ljubljana, Slovenia, September 10-12, 2012. Proceedings 12. Springer Berlin Heidelberg, 2012.
Calculates the eigenvalues of the Adjancency Matrix, where the Adjacency matrix is a square matrix indicate whether pairs of vertices are adjacent or not on a graph - here, entries in the matrix indicate connections between nodes (and betweens nodes and tips). Entries in the adjacency matrix are weighted by branch length. Then, using the adjacency matrix, we calculate the spectral properties of the matrix, e.g. the minimum and maximum eigenvalues of the matrix. When the R package RSpectra is available, a faster calculation can be used, which does not calculate all eigenvalues, but only the maximum and minimum. As such, when using this option, the vector of all eigenvalues is not returned
minmax_adj(phy, use_rspectra = FALSE)
minmax_adj(phy, use_rspectra = FALSE)
phy |
phylo object or ltable |
use_rspectra |
boolean to indicate whether the helping package RSpectra should be used, in which case only the minimum and maximum values are returned |
List with the minimum and maximum eigenvalues
Chindelevitch, Leonid, et al. "Network science inspires novel tree shape statistics." Plos one 16.12 (2021): e0259877.
Calculates the eigenvalues of the Laplacian Matrix, where the Laplacian matrix is the matrix representation of a graph, in this case a phylogeny. When the R package RSpectra is available, a faster calculation can be used, which does not calculate all eigenvalues, but only the maximum and minimum. As such, when using this option, the vector of all eigenvalues is not returned
minmax_laplace(phy, use_rspectra = FALSE)
minmax_laplace(phy, use_rspectra = FALSE)
phy |
phylo object or ltable |
use_rspectra |
boolean to indicate whether the helping package RSpectra should be used, in which case only the minimum and maximum values are returned |
List with the minimum and maximum eigenvalues
Chindelevitch, Leonid, et al. "Network science inspires novel tree shape statistics." Plos one 16.12 (2021): e0259877.
Per tip, evaluates the shortest distance to another tip, then takes the average across all tips.
mntd(phy)
mntd(phy)
phy |
phylo object or ltable |
Mean Nearest Taxon Distance.
Webb, C., D. Ackerly, M. McPeek, and M. Donoghue. 2002. Phylogenies and community ecology. Annual Review of Ecology and Systematics 33:475-505.
Calculates the maximum width divided by the maximum depth.
mw_over_md(phy)
mw_over_md(phy)
phy |
phylogeny or ltable |
C. Colijn and J. Gardy. Phylogenetic tree shapes resolve disease transmission patterns. Evolution, Medicine, and Public Health, 2014(1):96-108, 2014. ISSN 2050-6201. doi: 10.1093/emph/eou018.
The nLTT statistic calculates the sum of absolute differences in the number of lineages over time, where both the number of lineages and the time are normalized. The number of lineages is normalized by the number of extant tips, whereas the time is normalized by the crown age. The nLTT can only be calculated for reconstructed trees. Only use the treestats version if you are very certain about the input data, and are certain that performing nLTT is valid (e.g. your tree is ultrametric etc). If you are less certain, use the nLTT function from the nLTT package.
nLTT(phy, ref_tree)
nLTT(phy, ref_tree)
phy |
phylo object or ltable |
ref_tree |
reference tree to compare with (should be same type as phy) |
number of lineages
Janzen, T., Höhna, S. and Etienne, R.S. (2015), Approximate Bayesian Computation of diversification rates from molecular phylogenies: introducing a new efficient summary statistic, the nLTT. Methods Ecol Evol, 6: 566-575. https://doi.org/10.1111/2041-210X.12350
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) reference_tree <- ape::rphylo(n = 10, birth = 0.2, death = 0) nLTT(simulated_tree, reference_tree) nLTT(simulated_tree, simulated_tree) # should be zero.
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) reference_tree <- ape::rphylo(n = 10, birth = 0.2, death = 0) nLTT(simulated_tree, reference_tree) nLTT(simulated_tree, simulated_tree) # should be zero.
The base nLTT statistic can be used as a semi stand-alone
statistic for phylogenetic trees. However, please note that although this
provides a nice way of checking the power of the nLTT statistic without
directly comparing two trees, the nLTT_base statistic is not a substitute
for directly comparing two phylogenetic trees. E.g. one would perhaps
naively assume that .
Indeed, in some cases this may hold true (when, for instance, all normalized
lineages of A are less than all normalized lineages of B), but once the
nLTT curve of A intersects the nLTT curve of B, this no longer applies.
nLTT_base(phy)
nLTT_base(phy)
phy |
phylo object |
number of lineages
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) nLTT_base(simulated_tree)
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) nLTT_base(simulated_tree)
Number of tips of a tree, including extinct tips.
number_of_lineages(phy)
number_of_lineages(phy)
phy |
phylo object |
number of lineages
This function is a C++ implementation of the function DDD::phylo2L. An L table summarises a phylogeny in a table with four columns, being: 1) time at which a species is born, 2) label of the parent of the species, where positive and negative numbers indicate whether the species belongs to the left or right crown lineage, 3) label of the daughter species itself (again positive or negative depending on left or right crown lineage), and the last column 4) indicates the time of extinction of a species, or -1 if the species is extant.
phylo_to_l(phy)
phylo_to_l(phy)
phy |
phylo object |
ltable (see description)
simulated_tree <- ape::rphylo(n = 4, birth = 1, death = 0) ltable <- phylo_to_l(simulated_tree) reconstructed_tree <- DDD::L2phylo(ltable) old_par <- par() par(mfrow = c(1, 2)) # trees should be more or less similar, although labels may not match, and # rotations might cause (initial) visual mismatches plot(simulated_tree) plot(reconstructed_tree) par(old_par)
simulated_tree <- ape::rphylo(n = 4, birth = 1, death = 0) ltable <- phylo_to_l(simulated_tree) reconstructed_tree <- DDD::L2phylo(ltable) old_par <- par() par(mfrow = c(1, 2)) # trees should be more or less similar, although labels may not match, and # rotations might cause (initial) visual mismatches plot(simulated_tree) plot(reconstructed_tree) par(old_par)
The phylogenetic diversity at time t is given by the total branch length of the tree reconstructed up until time point t. Time is measured increasingly, with the crown age equal to 0. Thus, the time at the present is equal to the crown age.
phylogenetic_diversity(input_obj, t = 0, extinct_tol = NULL)
phylogenetic_diversity(input_obj, t = 0, extinct_tol = NULL)
input_obj |
phylo object or Ltable |
t |
time point at which to measure phylogenetic diversity, alternatively a vector of time points can also be provided. Time is measured with 0 being the present. |
extinct_tol |
tolerance to determine if a lineage is extinct at time t. Default is 1/100 * smallest branch length of the tree. |
phylogenetic diversity, or vector of phylogenetic diversity measures if a vector of time points is used as input.
Faith, Daniel P. "Conservation evaluation and phylogenetic diversity." Biological conservation 61.1 (1992): 1-10.
Calculates the change in rate between the first half and the second half of the extant phylogeny. Rho = (r2 - r1) / (r1 + r2), where r reflects the rate in either the first or second half. The rate within a half is given by (log(n2) - log(n1) / t, where n2 is the number of lineages at the end of the half, and n1 the number of lineages at the start of the half. Rho varies between -1 and 1, with a 0 indicating a constant rate across the phylogeny, a rho < 0 indicating a slow down and a rho > 0 indicating a speed up of speciation. In contrast to the Gamma statistic, Pigot's rho is not sensitive to tree size.
pigot_rho(phy)
pigot_rho(phy)
phy |
phylo object |
rho
Alex L. Pigot, Albert B. Phillimore, Ian P. F. Owens, C. David L. Orme, The Shape and Temporal Dynamics of Phylogenetic Trees Arising from Geographic Speciation, Systematic Biology, Volume 59, Issue 6, December 2010, Pages 660–673, https://doi.org/10.1093/sysbio/syq058
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) pigot_rho(simulated_tree) # should be around 0. ddd_tree <- DDD::dd_sim(pars = c(1, 0, 10), age = 7)$tes pigot_rho(ddd_tree) # because of diversity dependence, should be < 0
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) pigot_rho(simulated_tree) # should be around 0. ddd_tree <- DDD::dd_sim(pars = c(1, 0, 10), age = 7)$tes pigot_rho(ddd_tree) # because of diversity dependence, should be < 0
Pitchforks are a clade with three tips, as introduced in the phyloTop package.
pitchforks(input_obj, normalization = "none")
pitchforks(input_obj, normalization = "none")
input_obj |
phylo object or ltable |
normalization |
"none" or "tips", in which case the found number of pitchforks is divided by the expected number. |
number of pitchforks
The phylogenetic species variability is bounded in [0, 1]. The psv quantifies how phylogenetic relatedness decrease the variance of a (neutral) trait shared by all species in the tree. As species become more related, the psv tends to 0. Please note that the psv is a special case of the Mean Pair Distance (see appendix of Tucker et al. 2017 for a full derivation), and thus correlates directly.
psv(phy, normalization = "none")
psv(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
"none" or "tips", in which case the obtained mean pairwise distance is normalized by the factor 2log(n), where n is the number of tips. |
Phylogenetic Species Variability
Helmus M.R., Bland T.J., Williams C.K. & Ives A.R. (2007) Phylogenetic measures of biodiversity. American Naturalist, 169, E68-E83
Tucker, Caroline M., et al. "A guide to phylogenetic metrics for conservation, community ecology and macroecology." Biological Reviews 92.2 (2017): 698-715.
a function to modify an ltable, such that the longest path in the phylogeny is a crown lineage.
rebase_ltable(ltable)
rebase_ltable(ltable)
ltable |
ltable |
modified ltable
The Rogers index is calculated as the total number of internal nodes that are unbalanced, e.g. for which both daughter nodes lead to a different number of extant tips. in other words, the number of nodes where L != R (where L(R) is the number of extant tips of the Left (Right) daughter node).
rogers(phy, normalization = "none")
rogers(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
"none" or "tips", in which case the resulting statistic is divided by the number of tips - 2 (e.g. the maximum value of the rogers index for a tree). |
Rogers index
J. S. Rogers. Central Moments and Probability Distributions of Three Measures of Phylogenetic Tree Imbalance. Systematic Biology, 45(1):99-110, 1996. doi: 10.1093/sysbio/45.1.99.
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) rogers(balanced_tree) rogers(unbalanced_tree) # should be higher
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) rogers(balanced_tree) rogers(unbalanced_tree) # should be higher
Measures the distribution of tips over the two crown lineages, e.g. n1 / (n1 + n2), where n1 is the number of tips connected to crown lineage 1 and n2 is the number of tips connected to crown lineage 2. We always take n1 > n2, thus root imbalance is always in [0.5, 1].
root_imbalance(phy)
root_imbalance(phy)
phy |
phylo object or ltable |
Root imbalance
Guyer, Craig, and Joseph B. Slowinski. "Adaptive radiation and the topology of large phylogenies." Evolution 47.1 (1993): 253-263.
The rquartet index counts the number of potential fully balanced rooted subtrees of 4 tips in the tree. The function in treestats assumes a bifurcating tree. For trees with polytomies, we refer the user to treebalance::rquartedI, which can also take polytomies into account.
rquartet(phy, normalization = "none")
rquartet(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
The index can be normalized by the expectation under the Yule ("yule") or PDA model ("pda"). |
rquartet index
T. M. Coronado, A. Mir, F. Rosselló, and G. Valiente. A balance index for phylogenetic trees based on rooted quartets. Journal of Mathematical Biology, 79(3):1105-1148, 2019. doi: 10.1007/s00285-019-01377-w.
The Sackin index is calculated as the sum of ancestors for each of the tips. Higher values indicate higher imbalance. Two normalizations are available, where a correction is made for tree size, under either a Yule expectation, or a pda expectation.
sackin(phy, normalization = "none")
sackin(phy, normalization = "none")
phy |
phylogeny or ltable |
normalization |
normalization, either 'none' (default), "yule" or "pda". |
Sackin index
M. J. Sackin (1972). "Good" and "Bad" Phenograms. Systematic Biology. 21:225-226.
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) sackin(balanced_tree) sackin(unbalanced_tree) # should be much higher
simulated_tree <- ape::rphylo(n = 10, birth = 1, death = 0) balanced_tree <- treestats::create_fully_balanced_tree(simulated_tree) unbalanced_tree <- treestats::create_fully_unbalanced_tree(simulated_tree) sackin(balanced_tree) sackin(unbalanced_tree) # should be much higher
Calculates the staircase-ness measure, from the phyloTop package. The staircase-ness reflects the number of subtrees that are imbalanced, e.g. subtrees where the left child has more extant tips than the right child, or vice versa.
stairs(input_obj)
stairs(input_obj)
input_obj |
phylo object or ltable |
number of stairs
Norström, Melissa M., et al. "Phylotempo: a set of r scripts for assessing and visualizing temporal clustering in genealogies inferred from serially sampled viral sequences." Evolutionary Bioinformatics 8 (2012): EBO-S9738.
Calculates the stairs2 measure, from the phyloTop package. The stairs2 reflects the imbalance at each node, where it represents the average across measure at each node, the measure being min(l, r) / max(l, r), where l and r reflect the number of tips connected at the left (l) and right (r) daughter.
stairs2(input_obj)
stairs2(input_obj)
input_obj |
phylo object or ltable |
number of stairs
Norström, Melissa M., et al. "Phylotempo: a set of r scripts for assessing and visualizing temporal clustering in genealogies inferred from serially sampled viral sequences." Evolutionary Bioinformatics 8 (2012): EBO-S9738.
Balance metric that returns the total number of internal nodes that are not-symmetric (confusingly enough). A node is considered symmetric when both daughter trees have the same topology, measured as having the same sum of depths, where depth is measured as the distance from the root to the node/tip.
sym_nodes(phy, normalization = "none")
sym_nodes(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
"none" or "tips", in which case the resulting statistic is divided by the number of tips - 2 (e.g. the maximum value of the symmetry nodes index for a tree). |
Maximum depth (in number of edges)
S. J. Kersting and M. Fischer. Measuring tree balance using symmetry nodes — A new balance index and its extremal properties. Mathematical Biosciences, page 108690, 2021. ISSN 0025-5564. doi:https://doi.org/10.1016/j.mbs.2021.108690
The total cophenetic index is the sum of the depth of the last common ancestor of all pairs of leaves.
tot_coph(phy, normalization = "none")
tot_coph(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
"none" or "yule", when "yule" is chosen, the statistic is divided by the Yule expectation |
Total cophenetic index
A. Mir, F. Rosselló, and L. Rotger. A new balance index for phylogenetic trees. Mathematical Bio-sciences, 241(1):125-136, 2013. doi: 10.1016/j.mbs.2012.10.005.
The total internal path length describes the sums of the depths of all inner vertices of the tree.
tot_internal_path(phy)
tot_internal_path(phy)
phy |
phylo object or ltable |
Total internal path length
Knuth, Donald E. The Art of Computer Programming: Fundamental Algorithms, volume 1. Addison-Wesley Professional, 1997.
The total path length describes the sums of the depths of all vertices of the tree.
tot_path_length(phy)
tot_path_length(phy)
phy |
phylo object or ltable |
Total path length
C. Colijn and J. Gardy. Phylogenetic tree shapes resolve disease transmission patterns. Evolution, Medicine, and Public Health, 2014(1):96-108, 2014. ISSN 2050-6201. doi: 10.1093/emph/eou018.
In a reconstructed tree, obtaining the tree height is fairly straightforward, and the function beautier::get_crown_age does a great job at it. However, in a non-ultrametric tree, that function no longer works. Alternatively, taking the maximum value of adephylo::distRoot will also yield the tree height (including the root branch), but will typically perform many superfluous calculations and thus be slow.
tree_height(phy)
tree_height(phy)
phy |
phylo object |
crown age
Calculates the fraction of tree length on internal branches, also known as treeness or stemminess
treeness(phy)
treeness(phy)
phy |
phylo object or Ltable |
sum of all internal branch lengths (e.g. branches not leading to a tip) divided by the sum over all branch lengths.
Variance of branch lengths of a tree, including extinct branches.
var_branch_length(phy)
var_branch_length(phy)
phy |
phylo object or Ltable |
variance of branch lengths
Variance of external branch lengths of a tree, e.g. of branches leading to a tip.
var_branch_length_ext(phy)
var_branch_length_ext(phy)
phy |
phylo object or Ltable |
variance of external branch lengths
Variance of internal branch lengths of a tree, e.g. of branches not leading to a tip.
var_branch_length_int(phy)
var_branch_length_int(phy)
phy |
phylo object or Ltable |
variance of internal branch lengths
The variance of leaf depth statistic returns the variance of depths across all tips.
var_leaf_depth(phy, normalization = "none")
var_leaf_depth(phy, normalization = "none")
phy |
phylo object or ltable |
normalization |
"none" or "yule", when "yule" is chosen, the statistic is divided by the Yule expectation |
Variance of leaf depths
T. M. Coronado, A. Mir, F. Rosselló, and L. Rotger. On Sackin's original proposal: the variance of the leaves' depths as a phylogenetic balance index. BMC Bioinformatics, 21(1), 2020. doi: 10.1186/s12859-020-3405-1.
After calculating all pairwise distances between all tips, this function takes the variance across these values.
var_pair_dist(phy)
var_pair_dist(phy)
phy |
phylo object or ltable |
Variance in pairwise distance
Webb, C., D. Ackerly, M. McPeek, and M. Donoghue. 2002. Phylogenies and community ecology. Annual Review of Ecology and Systematics 33:475-505.
The Wiener index is defined as the sum of all shortest path lengths between pairs of nodes in a tree.
wiener(phy, normalization = FALSE, weight = TRUE)
wiener(phy, normalization = FALSE, weight = TRUE)
phy |
phylo object or ltable |
normalization |
if TRUE, the Wiener index is normalized by the number of nodes, e.g. by choose(n, 2), where n is the number of nodes. |
weight |
if TRUE, branch lenghts are used. |
Wiener index
Chindelevitch, Leonid, et al. "Network science inspires novel tree shape statistics." Plos one 16.12 (2021): e0259877. Mohar, B., Pisanski, T. How to compute the Wiener index of a graph. J Math Chem 2, 267–277 (1988)