Rescale tree
Most of the phylogenetic trees are scaled by evolutionary distance (substitution/site). In ggtree
, users can re-scale a phylogenetic tree by any numerical variable inferred by evolutionary analysis (e.g. dN/dS).
beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree")
beast_tree <- read.beast(beast_file)
beast_tree
## 'beast' S4 object that stored information of
## '/tmp/Rtmpiusgd4/Rinst385a1fbf845e/ggtree/examples/MCC_FluA_H3.tree'.
##
## ...@ tree:
## Phylogenetic tree with 76 tips and 75 internal nodes.
##
## Tip labels:
## A/Hokkaido/30-1-a/2013, A/New_York/334/2004, A/New_York/463/2005, A/New_York/452/1999, A/New_York/238/2005, A/New_York/523/1998, ...
##
## Rooted; includes branch lengths.
##
## with the following features available:
## 'height', 'height_0.95_HPD', 'height_median', 'height_range', 'length',
## 'length_0.95_HPD', 'length_median', 'length_range', 'posterior', 'rate',
## 'rate_0.95_HPD', 'rate_median', 'rate_range'.
p1 <- ggtree(beast_tree, mrsd='2013-01-01') + theme_tree2() +
ggtitle("Divergence time")
p2 <- ggtree(beast_tree, branch.length = 'rate') + theme_tree2() +
ggtitle("Substitution rate")
multiplot(p1, p2, ncol=2)
mlcfile <- system.file("extdata/PAML_Codeml", "mlc", package="treeio")
mlc_tree <- read.codeml_mlc(mlcfile)
p1 <- ggtree(mlc_tree) + theme_tree2() +
ggtitle("nucleotide substitutions per codon")
p2 <- ggtree(mlc_tree, branch.length='dN_vs_dS') + theme_tree2() +
ggtitle("dN/dS tree")
multiplot(p1, p2, ncol=2)
In addition to specify branch.length
in tree visualization, users can change branch length stored in tree object by using rescale_tree
function.
beast_tree2 <- rescale_tree(beast_tree, branch.length = 'rate')
ggtree(beast_tree2) + theme_tree2()
Zoom on a portion of tree
ggtree
provides gzoom
function that similar to zoom
function provided in ape
. This function plots simultaneously a whole phylogenetic tree and a portion of it. It aims at exploring very large trees.
library("ape")
data(chiroptera)
library("ggtree")
gzoom(chiroptera, grep("Plecotus", chiroptera$tip.label))
Zoom in selected clade of a tree that was already annotated with ggtree
is also supported.
groupInfo <- split(chiroptera$tip.label, gsub("_\\w+", "", chiroptera$tip.label))
chiroptera <- groupOTU(chiroptera, groupInfo)
p <- ggtree(chiroptera, aes(color=group)) + geom_tiplab() + xlim(NA, 23)
gzoom(p, grep("Plecotus", chiroptera$tip.label), xmax_adjust=2)
Color tree
In ggtree
, coloring phylogenetic tree is easy, by using aes(color=VAR)
to map the color of tree based on a specific variable (numeric and category are both supported).
ggtree(beast_tree, aes(color=rate)) +
scale_color_continuous(low='darkgreen', high='red') +
theme(legend.position="right")
User can use any feature (if available), including clade posterior and dN/dS etc., to scale the color of the tree.
Annotate clades
ggtree
implements geom_cladelabel
layer to annotate a selected clade with a bar indicating the clade with a corresponding label.
The geom_cladelabel
layer accepts a selected internal node number. To get the internal node number, please refer to Tree Manipulation vignette.
set.seed(2015-12-21)
tree = rtree(30)
p <- ggtree(tree) + xlim(NA, 6)
p+geom_cladelabel(node=45, label="test label") +
geom_cladelabel(node=34, label="another clade")
Users can set the parameter, align = TRUE
, to align the clade label, and use the parameter, offset
, to adjust the position.
p+geom_cladelabel(node=45, label="test label", align=TRUE, offset=.5) +
geom_cladelabel(node=34, label="another clade", align=TRUE, offset=.5)
Users can change the color of the clade label via the parameter color
.
p+geom_cladelabel(node=45, label="test label", align=T, color='red') +
geom_cladelabel(node=34, label="another clade", align=T, color='blue')
Users can change the angle
of the clade label text and relative position from text to bar via the parameter offset.text
.
p+geom_cladelabel(node=45, label="test label", align=T, angle=270, hjust='center', offset.text=.5) +
geom_cladelabel(node=34, label="another clade", align=T, angle=45)
The size of the bar and text can be changed via the parameters barsize
and fontsize
respectively.
p+geom_cladelabel(node=45, label="test label", align=T, angle=270, hjust='center', offset.text=.5, barsize=1.5) +
geom_cladelabel(node=34, label="another clade", align=T, angle=45, fontsize=8)
Users can also use geom_label
to label the text.
p+ geom_cladelabel(node=34, label="another clade", align=T, geom='label', fill='lightblue')
Highlight clades
ggtree
implements geom_hilight
layer, that accepts an internal node number and add a layer of rectangle to highlight the selected clade.
nwk <- system.file("extdata", "sample.nwk", package="treeio")
tree <- read.tree(nwk)
ggtree(tree) + geom_hilight(node=21, fill="steelblue", alpha=.6) +
geom_hilight(node=17, fill="darkgreen", alpha=.6)
ggtree(tree, layout="circular") + geom_hilight(node=21, fill="steelblue", alpha=.6) +
geom_hilight(node=23, fill="darkgreen", alpha=.6)
Another way to highlight selected clades is setting the clades with different colors and/or line types as demonstrated in Tree Manipulation vignette.
Highlight balances
In addition to geom_hilight
, ggtree
also implements geom_balance
which is designed to highlight neighboring subclades of a given internal node.
ggtree(tree) +
geom_balance(node=16, fill='steelblue', color='white', alpha=0.6, extend=1) +
geom_balance(node=19, fill='darkgreen', color='white', alpha=0.6, extend=1)
labelling associated taxa (Monophyletic, Polyphyletic or Paraphyletic)
geom_cladelabel
is designed for labelling Monophyletic (Clade) while there are related taxa that are not form a clade. ggtree
provides geom_strip
to add a strip/bar to indicate the association with optional label (see the issue).
ggtree(tree) + geom_tiplab() + geom_strip(5, 7, barsize=2, color='red') + geom_strip(6, 12, barsize=2, color='blue')
taxa connection
Some evolutionary events (e.g. reassortment, horizontal gene transfer) can be modeled by a simple tree. ggtree
provides geom_taxalink
layer that allows drawing straight or curved lines between any of two nodes in the tree, allow it to represent evolutionary events by connecting taxa.
ggtree(tree) + geom_tiplab() + geom_taxalink('A', 'E') + geom_taxalink('F', 'K', color='red', arrow=grid::arrow(length = grid::unit(0.02, "npc")))
Tree annotation with analysis of R packages
annotating tree with ape bootstraping analysis
library(ape)
data(woodmouse)
d <- dist.dna(woodmouse)
tr <- nj(d)
bp <- boot.phylo(tr, woodmouse, function(xx) nj(dist.dna(xx)))
tree <- apeBoot(tr, bp)
ggtree(tree) + geom_label(aes(label=bootstrap)) + geom_tiplab()
annotating tree with phangorn output
library(phangorn)
treefile <- system.file("extdata", "pa.nwk", package="treeio")
tre <- read.tree(treefile)
tipseqfile <- system.file("extdata", "pa.fas", package="treeio")
tipseq <- read.phyDat(tipseqfile,format="fasta")
fit <- pml(tre, tipseq, k=4)
fit <- optim.pml(fit, optNni=FALSE, optBf=T, optQ=T,
optInv=T, optGamma=T, optEdge=TRUE,
optRooted=FALSE, model = "GTR")
phangorn <- phyPML(fit, type="ml")
ggtree(phangorn) + geom_text(aes(x=branch, label=AA_subs, vjust=-.5))
Tree annotation with output from evolution software
In ggtree
, we implemented several parser functions to parse output from commonly used software package in evolutionary biology, including:
Evolutionary evidences inferred by these software packages can be used for further analysis in R
and annotating phylogenetic tree directly in ggtree
. For more details, please refer to the Tree Data Import vignette.
Tree annotation with user specified annotation
the %<+%
operator
In addition to parse commonly used software output, ggtree
also supports annotating a phylogenetic tree using user’s own data.
Suppose we have the following data that associate with the tree and would like to attach the data in the tree.
nwk <- system.file("extdata", "sample.nwk", package="treeio")
tree <- read.tree(nwk)
p <- ggtree(tree)
dd <- data.frame(taxa = LETTERS[1:13],
place = c(rep("GZ", 5), rep("HK", 3), rep("CZ", 4), NA),
value = round(abs(rnorm(13, mean=70, sd=10)), digits=1))
## you don't need to order the data
## data was reshuffled just for demonstration
dd <- dd[sample(1:13, 13), ]
row.names(dd) <- NULL
print(dd)
taxa | place | value |
---|---|---|
M | NA | 74.6 |
D | GZ | 55.9 |
A | GZ | 61.5 |
K | CZ | 60.4 |
C | GZ | 76.1 |
H | HK | 73.4 |
L | CZ | 62.2 |
J | CZ | 54.9 |
F | HK | 89.3 |
I | CZ | 71.0 |
G | HK | 72.6 |
E | GZ | 84.9 |
B | GZ | 77.6 |
We can imaging that the place
column stores the location that we isolated the species and value
column stores numerical values (e.g. bootstrap values).
We have demonstrated using the operator, %<%
, to update a tree view with a new tree. Here, we will introduce another operator, %<+%
, that attaches annotation data to a tree view. The only requirement of the input data is that its first column should be matched with the node/tip labels of the tree.
After attaching the annotation data to the tree by %<+%
, all the columns in the data are visible to ggtree
. As an example, here we attach the above annotation data to the tree view, p
, and add a layer that showing the tip labels and colored them by the isolation site stored in place
column.
p <- p %<+% dd + geom_tiplab(aes(color=place)) +
geom_tippoint(aes(size=value, shape=place, color=place), alpha=0.25)
p+theme(legend.position="right")
Once the data was attached, it is always attached. So that we can add other layers to display these information easily.
p + geom_text(aes(color=place, label=place), hjust=1, vjust=-0.4, size=3) +
geom_text(aes(color=place, label=value), hjust=1, vjust=1.4, size=3)
phylo4d
phylo4d
was defined in the phylobase
package, which can be employed to integrate user’s data with phylogenetic tree. phylo4d
was supported in ggtree
and the data stored in the object can be used directly to annotate the tree.
dd2 <- dd[, -1]
rownames(dd2) <- dd[,1]
require(phylobase)
tr2 <- phylo4d(tree, dd2)
ggtree(tr2) + geom_tiplab(aes(color=place)) +
geom_tippoint(aes(size=value, shape=place, color=place), alpha=0.25)
jplace file format
ggtree
provides write.jplace()
function to store user’s own data and associated newick tree to a single jplace
file, which can be parsed directly in ggtree
and user’s data can be used to annotate the tree directly. For more detail, please refer to the Tree Data Import vignette.
Advance tree annotation
Advance tree annotation including visualizing phylogenetic tree with associated matrix and multiple sequence alignment; annotating tree with subplots and images (especially PhyloPic). For details and examples, please refer to the Advance Tree Annotation vignette.
Interactive tree annotation
Interactive tree annotation is also possible, please refer to https://guangchuangyu.github.io/2016/06/identify-method-for-ggtree.
References
1. Bouckaert, R. et al. BEAST 2: A software platform for bayesian evolutionary analysis. PLoS Comput Biol 10, e1003537 (2014).
2. Berger, S. A., Krompass, D. & Stamatakis, A. Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Systematic Biology 60, 291–302 (2011).
3. Pond, S. L. K., Frost, S. D. W. & Muse, S. V. HyPhy: Hypothesis testing using phylogenies. Bioinformatics 21, 676–679 (2005).
4. Yang, Z. PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24, 1586–1591 (2007).
5. Boussau, B. et al. Genome-scale coestimation of species and gene trees. Genome Res. 23, 323–330 (2013).
6. Matsen, F. A., Kodner, R. B. & Armbrust, E. V. Pplacer: Linear time maximum-likelihood and bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 538 (2010).
7. Marazzi, B. et al. Locating evolutionary precursors on a phylogenetic tree. Evolution 66, 3918–3930 (2012).
8. Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics btu033 (2014). doi:10.1093/bioinformatics/btu033
9. Höhna, S. et al. Probabilistic graphical model representation in phylogenetics. Syst Biol 63, 753–771 (2014).