---
title: "How to visualize nLTT values distributions"
author: "Richel Bilderbeek"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{How to visualize nLTT values distributions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
library(nLTT) #nolint
require(ggplot2)
require(knitr)
# temporary fix to keep R-devel happy.
  # should be updated upon release of version 3.6

suppressWarnings(RNGversion("3.5.0"))

```

Calculating the average nLTT plot of multiple phylogenies
is not a trivial tasks. 

The function `get_nltt_values`
collects the nLTT values of a collection of
phylogenies as tidy data.

This allows for a good interplay with ggplot2.

### Example: Easy trees

Create two easy trees:

```{r}
newick1 <- "((A:1,B:1):2,C:3);"
newick2 <- "((A:2,B:2):1,C:3);"
phylogeny1 <- ape::read.tree(text = newick1)
phylogeny2 <- ape::read.tree(text = newick2)
phylogenies <- c(phylogeny1, phylogeny2)
```

There are very similar. `phylogeny1` has short tips:

```{r}
ape::plot.phylo(phylogeny1)
ape::add.scale.bar() #nolint
```

This can be observed in the nLTT plot:

```{r}
nLTT::nltt_plot(phylogeny1, ylim = c(0, 1))
```

As a collection of timepoints:

```{r}
t <- nLTT::get_phylogeny_nltt_matrix(phylogeny1)
knitr::kable(t)
```

Plotting those timepoints:

```{r}
df <- as.data.frame(nLTT::get_phylogeny_nltt_matrix(phylogeny1))
ggplot2::qplot(
  time, N, data = df, geom = "step", ylim = c(0, 1), direction = "vh",
  main = "NLTT plot of phylogeny 1"
)
```


`phylogeny2` has longer tips:

```{r}
ape::plot.phylo(phylogeny2)
ape::add.scale.bar() #nolint
```

Also this can be observed in the nLTT plot:

```{r}
nLTT::nltt_plot(phylogeny2, ylim = c(0, 1))
```

As a collection of timepoints:

```{r}
t <- nLTT::get_phylogeny_nltt_matrix(phylogeny2)
knitr::kable(t)
```

Plotting those timepoints:

```{r}
df <- as.data.frame(nLTT::get_phylogeny_nltt_matrix(phylogeny2))
ggplot2::qplot(
  time, N, data = df, geom = "step", ylim = c(0, 1), direction = "vh",
  main = "NLTT plot of phylogeny 2"
)
```


The average nLTT plot should be somewhere in the middle.

It is constructed from stretched nLTT matrices.

Here is the nLTT matrix of the first phylogeny:

```{r}
t <- nLTT::stretch_nltt_matrix(
  nLTT::get_phylogeny_nltt_matrix(phylogeny1), dt = 0.20, step_type = "upper"
)
knitr::kable(t)
```

Here is the nLTT matrix of the second phylogeny:

```{r}
t <- nLTT::stretch_nltt_matrix(
  nLTT::get_phylogeny_nltt_matrix(phylogeny2), dt = 0.20, step_type = "upper"
)
knitr::kable(t)
```

Here is the average nLTT matrix of both phylogenies:

```{r}
t <- nLTT::get_average_nltt_matrix(phylogenies, dt = 0.20)
knitr::kable(t)
```

Observe how the numbers get averaged.

The same, now shown as a plot:

```{r}
nLTT::nltts_plot(phylogenies, dt = 0.20, plot_nltts = TRUE)
```

Here a demo how the new function works:

```{r}
t <- nLTT::get_nltt_values(c(phylogeny1, phylogeny2), dt = 0.2)
knitr::kable(t)
```

Plotting options, first create a data frame:

```{r}
df <- nLTT::get_nltt_values(c(phylogeny1, phylogeny2), dt = 0.01)
```

Here we see an averaged nLTT plot, where the original nLTT values are still visible:

```{r fig.width = 7, fig.height = 7}
ggplot2::qplot(
  t, nltt, data = df, geom = "point", ylim = c(0, 1),
  main = "Average nLTT plot of phylogenies", color = id, size = I(0.1)
) + ggplot2::stat_summary(
  fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)
```

Here we see an averaged nLTT plot, with the original nLTT values omitted:

```{r}
ggplot2::qplot(t, nltt, data = df, geom = "blank", ylim = c(0, 1),
  main = "Average nLTT plot of phylogenies"
) + ggplot2::stat_summary(
  fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)
```


### Example: Harder trees

Create two harder trees:

```{r}
newick1 <- "((A:1,B:1):1,(C:1,D:1):1);"
newick2 <- paste0("((((XD:1,ZD:1):1,CE:2):1,(FE:2,EE:2):1):4,((AE:1,BE:1):1,",
  "(WD:1,YD:1):1):5);"
)
phylogeny1 <- ape::read.tree(text = newick1)
phylogeny2 <- ape::read.tree(text = newick2)
phylogenies <- c(phylogeny1, phylogeny2)
```

There are different. `phylogeny1` is relatively simple, with two branching events happening at the same time:

```{r}
ape::plot.phylo(phylogeny1)
ape::add.scale.bar() #nolint
```

This can be observed in the nLTT plot:

```{r}
nLTT::nltt_plot(phylogeny1, ylim = c(0, 1))
```

As a collection of timepoints:

```{r}
t <- nLTT::get_phylogeny_nltt_matrix(phylogeny2)
knitr::kable(t)
```

`phylogeny2` is more elaborate:

```{r}
ape::plot.phylo(phylogeny2)
ape::add.scale.bar() #nolint
```

Also this can be observed in the nLTT plot:

```{r}
nLTT::nltt_plot(phylogeny2, ylim = c(0, 1))
```

As a collection of timepoints:

```{r}
t <- nLTT::get_phylogeny_nltt_matrix(phylogeny2)
knitr::kable(t)
```


The average nLTT plot should be somewhere in the middle.

It is constructed from stretched nLTT matrices.

Here is the nLTT matrix of the first phylogeny:

```{r}
t <- nLTT::stretch_nltt_matrix(
  nLTT::get_phylogeny_nltt_matrix(phylogeny1), dt = 0.20, step_type = "upper"
)
knitr::kable(t)
```

Here is the nLTT matrix of the second phylogeny:

```{r}
t <- nLTT::stretch_nltt_matrix(
  nLTT::get_phylogeny_nltt_matrix(phylogeny2), dt = 0.20, step_type = "upper"
)
knitr::kable(t)
```

Here is the average nLTT matrix of both phylogenies:

```{r}
t <- nLTT::get_average_nltt_matrix(phylogenies, dt = 0.20)
knitr::kable(t)
```

Observe how the numbers get averaged.

Here a demo how the new function works:

```{r}
t <- nLTT::get_nltt_values(c(phylogeny1, phylogeny2), dt = 0.2)
knitr::kable(t)
```

Plotting options, first create a data frame:

```{r}
df <- nLTT::get_nltt_values(c(phylogeny1, phylogeny2), dt = 0.01)
```

Here we see an averaged nLTT plot, where the original nLTT values are still visible:

```{r fig.width = 7, fig.height = 7}
ggplot2::qplot(
  t, nltt, data = df, geom = "point", ylim = c(0, 1),
  main = "Average nLTT plot of phylogenies", color = id, size = I(0.1)
) + ggplot2::stat_summary(
  fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)
```

Here we see an averaged nLTT plot, with the original nLTT values omitted:


```{r fig.width = 7, fig.height = 7}
ggplot2::qplot(t, nltt, data = df, geom = "blank", ylim = c(0, 1),
  main = "Average nLTT plot of phylogenies"
) + ggplot2::stat_summary(
  fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)
```

### Example: Five random trees

Create three random trees:

```{r}
set.seed(42)
phylogeny1 <- ape::rcoal(10)
phylogeny2 <- ape::rcoal(20)
phylogeny3 <- ape::rcoal(30)
phylogeny4 <- ape::rcoal(40)
phylogeny5 <- ape::rcoal(50)
phylogeny6 <- ape::rcoal(60)
phylogeny7 <- ape::rcoal(70)
phylogenies <- c(
  phylogeny1, phylogeny2, phylogeny3,
  phylogeny4, phylogeny5, phylogeny6, phylogeny7
)
```

Here a demo how the new function works:

```{r}
t <- nLTT::get_nltt_values(phylogenies, dt = 0.2)
knitr::kable(t)
```


Here we see an averaged nLTT plot, where the original nLTT values are still visible:

```{r fig.width = 7, fig.height = 7}
ggplot2::qplot(t, nltt, data = df, geom = "point", ylim = c(0, 1),
  main = "Average nLTT plot of phylogenies", color = id, size = I(0.1)
) + ggplot2::stat_summary(
  fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)
```

Here we see an averaged nLTT plot, with the original nLTT values omitted:

```{r fig.width = 7, fig.height = 7}
ggplot2::qplot(t, nltt, data = df, geom = "blank", ylim = c(0, 1),
  main = "Average nLTT plot of phylogenies"
) + ggplot2::stat_summary(
  fun.data = "mean_cl_boot", color = "red", geom = "smooth"
)
```