--- title: "How to visualize nLTT values distributions" author: "Richel Bilderbeek" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{How to visualize nLTT values distributions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(nLTT) #nolint require(ggplot2) require(knitr) # temporary fix to keep R-devel happy. # should be updated upon release of version 3.6 suppressWarnings(RNGversion("3.5.0")) ``` Calculating the average nLTT plot of multiple phylogenies is not a trivial tasks. The function `get_nltt_values` collects the nLTT values of a collection of phylogenies as tidy data. This allows for a good interplay with ggplot2. ### Example: Easy trees Create two easy trees: ```{r} newick1 <- "((A:1,B:1):2,C:3);" newick2 <- "((A:2,B:2):1,C:3);" phylogeny1 <- ape::read.tree(text = newick1) phylogeny2 <- ape::read.tree(text = newick2) phylogenies <- c(phylogeny1, phylogeny2) ``` There are very similar. `phylogeny1` has short tips: ```{r} ape::plot.phylo(phylogeny1) ape::add.scale.bar() #nolint ``` This can be observed in the nLTT plot: ```{r} nLTT::nltt_plot(phylogeny1, ylim = c(0, 1)) ``` As a collection of timepoints: ```{r} t <- nLTT::get_phylogeny_nltt_matrix(phylogeny1) knitr::kable(t) ``` Plotting those timepoints: ```{r} df <- as.data.frame(nLTT::get_phylogeny_nltt_matrix(phylogeny1)) ggplot2::qplot( time, N, data = df, geom = "step", ylim = c(0, 1), direction = "vh", main = "NLTT plot of phylogeny 1" ) ``` `phylogeny2` has longer tips: ```{r} ape::plot.phylo(phylogeny2) ape::add.scale.bar() #nolint ``` Also this can be observed in the nLTT plot: ```{r} nLTT::nltt_plot(phylogeny2, ylim = c(0, 1)) ``` As a collection of timepoints: ```{r} t <- nLTT::get_phylogeny_nltt_matrix(phylogeny2) knitr::kable(t) ``` Plotting those timepoints: ```{r} df <- as.data.frame(nLTT::get_phylogeny_nltt_matrix(phylogeny2)) ggplot2::qplot( time, N, data = df, geom = "step", ylim = c(0, 1), direction = "vh", main = "NLTT plot of phylogeny 2" ) ``` The average nLTT plot should be somewhere in the middle. It is constructed from stretched nLTT matrices. Here is the nLTT matrix of the first phylogeny: ```{r} t <- nLTT::stretch_nltt_matrix( nLTT::get_phylogeny_nltt_matrix(phylogeny1), dt = 0.20, step_type = "upper" ) knitr::kable(t) ``` Here is the nLTT matrix of the second phylogeny: ```{r} t <- nLTT::stretch_nltt_matrix( nLTT::get_phylogeny_nltt_matrix(phylogeny2), dt = 0.20, step_type = "upper" ) knitr::kable(t) ``` Here is the average nLTT matrix of both phylogenies: ```{r} t <- nLTT::get_average_nltt_matrix(phylogenies, dt = 0.20) knitr::kable(t) ``` Observe how the numbers get averaged. The same, now shown as a plot: ```{r} nLTT::nltts_plot(phylogenies, dt = 0.20, plot_nltts = TRUE) ``` Here a demo how the new function works: ```{r} t <- nLTT::get_nltt_values(c(phylogeny1, phylogeny2), dt = 0.2) knitr::kable(t) ``` Plotting options, first create a data frame: ```{r} df <- nLTT::get_nltt_values(c(phylogeny1, phylogeny2), dt = 0.01) ``` Here we see an averaged nLTT plot, where the original nLTT values are still visible: ```{r fig.width = 7, fig.height = 7} ggplot2::qplot( t, nltt, data = df, geom = "point", ylim = c(0, 1), main = "Average nLTT plot of phylogenies", color = id, size = I(0.1) ) + ggplot2::stat_summary( fun.data = "mean_cl_boot", color = "red", geom = "smooth" ) ``` Here we see an averaged nLTT plot, with the original nLTT values omitted: ```{r} ggplot2::qplot(t, nltt, data = df, geom = "blank", ylim = c(0, 1), main = "Average nLTT plot of phylogenies" ) + ggplot2::stat_summary( fun.data = "mean_cl_boot", color = "red", geom = "smooth" ) ``` ### Example: Harder trees Create two harder trees: ```{r} newick1 <- "((A:1,B:1):1,(C:1,D:1):1);" newick2 <- paste0("((((XD:1,ZD:1):1,CE:2):1,(FE:2,EE:2):1):4,((AE:1,BE:1):1,", "(WD:1,YD:1):1):5);" ) phylogeny1 <- ape::read.tree(text = newick1) phylogeny2 <- ape::read.tree(text = newick2) phylogenies <- c(phylogeny1, phylogeny2) ``` There are different. `phylogeny1` is relatively simple, with two branching events happening at the same time: ```{r} ape::plot.phylo(phylogeny1) ape::add.scale.bar() #nolint ``` This can be observed in the nLTT plot: ```{r} nLTT::nltt_plot(phylogeny1, ylim = c(0, 1)) ``` As a collection of timepoints: ```{r} t <- nLTT::get_phylogeny_nltt_matrix(phylogeny2) knitr::kable(t) ``` `phylogeny2` is more elaborate: ```{r} ape::plot.phylo(phylogeny2) ape::add.scale.bar() #nolint ``` Also this can be observed in the nLTT plot: ```{r} nLTT::nltt_plot(phylogeny2, ylim = c(0, 1)) ``` As a collection of timepoints: ```{r} t <- nLTT::get_phylogeny_nltt_matrix(phylogeny2) knitr::kable(t) ``` The average nLTT plot should be somewhere in the middle. It is constructed from stretched nLTT matrices. Here is the nLTT matrix of the first phylogeny: ```{r} t <- nLTT::stretch_nltt_matrix( nLTT::get_phylogeny_nltt_matrix(phylogeny1), dt = 0.20, step_type = "upper" ) knitr::kable(t) ``` Here is the nLTT matrix of the second phylogeny: ```{r} t <- nLTT::stretch_nltt_matrix( nLTT::get_phylogeny_nltt_matrix(phylogeny2), dt = 0.20, step_type = "upper" ) knitr::kable(t) ``` Here is the average nLTT matrix of both phylogenies: ```{r} t <- nLTT::get_average_nltt_matrix(phylogenies, dt = 0.20) knitr::kable(t) ``` Observe how the numbers get averaged. Here a demo how the new function works: ```{r} t <- nLTT::get_nltt_values(c(phylogeny1, phylogeny2), dt = 0.2) knitr::kable(t) ``` Plotting options, first create a data frame: ```{r} df <- nLTT::get_nltt_values(c(phylogeny1, phylogeny2), dt = 0.01) ``` Here we see an averaged nLTT plot, where the original nLTT values are still visible: ```{r fig.width = 7, fig.height = 7} ggplot2::qplot( t, nltt, data = df, geom = "point", ylim = c(0, 1), main = "Average nLTT plot of phylogenies", color = id, size = I(0.1) ) + ggplot2::stat_summary( fun.data = "mean_cl_boot", color = "red", geom = "smooth" ) ``` Here we see an averaged nLTT plot, with the original nLTT values omitted: ```{r fig.width = 7, fig.height = 7} ggplot2::qplot(t, nltt, data = df, geom = "blank", ylim = c(0, 1), main = "Average nLTT plot of phylogenies" ) + ggplot2::stat_summary( fun.data = "mean_cl_boot", color = "red", geom = "smooth" ) ``` ### Example: Five random trees Create three random trees: ```{r} set.seed(42) phylogeny1 <- ape::rcoal(10) phylogeny2 <- ape::rcoal(20) phylogeny3 <- ape::rcoal(30) phylogeny4 <- ape::rcoal(40) phylogeny5 <- ape::rcoal(50) phylogeny6 <- ape::rcoal(60) phylogeny7 <- ape::rcoal(70) phylogenies <- c( phylogeny1, phylogeny2, phylogeny3, phylogeny4, phylogeny5, phylogeny6, phylogeny7 ) ``` Here a demo how the new function works: ```{r} t <- nLTT::get_nltt_values(phylogenies, dt = 0.2) knitr::kable(t) ``` Here we see an averaged nLTT plot, where the original nLTT values are still visible: ```{r fig.width = 7, fig.height = 7} ggplot2::qplot(t, nltt, data = df, geom = "point", ylim = c(0, 1), main = "Average nLTT plot of phylogenies", color = id, size = I(0.1) ) + ggplot2::stat_summary( fun.data = "mean_cl_boot", color = "red", geom = "smooth" ) ``` Here we see an averaged nLTT plot, with the original nLTT values omitted: ```{r fig.width = 7, fig.height = 7} ggplot2::qplot(t, nltt, data = df, geom = "blank", ylim = c(0, 1), main = "Average nLTT plot of phylogenies" ) + ggplot2::stat_summary( fun.data = "mean_cl_boot", color = "red", geom = "smooth" ) ```