You can’t even begin to understand biology, you can’t understand life, unless you understand what it’s all there for, how it arose - and that means evolution. — Richard Dawkins
Citation
If you use ggtree
in published research, please cite:
G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017, 8(1):28-36. doi:[10.1111/2041-210X.12628](http://dx.doi.org/10.1111/2041-210X.12628)
Introduction
This project arose from our needs to annotate nucleotide substitutions in the phylogenetic tree, and we found that there is no tree visualization software can do this easily. Existing tree viewers are designed for displaying phylogenetic tree, but not annotating it. Although some tree viewers can displaying bootstrap values in the tree, it is hard/impossible to display other information in the tree. Our first solution for displaying nucleotide substituitions in the tree is to add this information in the node/tip names and use traditional tree viewer to show it. We displayed the information in the tree successfully, but we believe this indirect approach is inefficient.
Previously, phylogenetic trees were much smaller. Annotation of phylogenetic trees was not as necessary as nowadays much more data is becomming available. We want to associate our experimental data, for instance antigenic change, with the evolution relationship. Visualizing these associations in a phylogenetic tree can help us to identify evolution patterns. We believe we need a next generation tree viewer that should be programmable and extensible. It can view a phylogenetic tree easily as we did with classical software and support adding annotation data in a layer above the tree. This is the objective of developing the ggtree
. Common tasks of annotating a phylogenetic tree should be easy and complicated tasks can be possible to achieve by adding multiple layers of annotation.
The ggtree
is designed by extending the ggplot2
1 package. It is based on the grammar of graphics and takes all the good parts of ggplot2
. There are other R packages that implement tree viewer using ggplot2
, including OutbreakTools
, phyloseq
2 and ggphylo; they mostly create complex tree view functions for their specific needs. Internally, these packages interpret a phylogenetic as a collection of lines
, which makes it hard to annotate diverse user input that are related to node (taxa). The ggtree
is different to them by interpreting a tree as a collection of taxa
and allowing general flexibilities of annotating phylogenetic tree with diverse types of user inputs.
Getting data into R
Most of the tree viewer software (including R
packages) focus on Newick
and Nexus
file format, while there are file formats from different evolution analysis software that contain supporting evidences within the file that are ready for annotating a phylogenetic tree. In addition to Newick
and Nexus
, ggtree supports NHX
, jplace
and Phylip
file formats. ggtree
also supports software outputs from BEAST3, EPA4, HYPHY5, PAML6, PHYLDOG7, pplacer8, r8s9, RAxML10 and RevBayes11.
Parsing data from a number of molecular evolution software is not only for visualization in ggtree
, but also bring these data to R
users for further analysis (e.g. summarization, visualization, comparision, test, etc).
For more details, please refer to Tree Data Import vignette.
Tree Visualization and Annotation
Tree Visualization in ggtree
is easy, with one line of command ggtree(tree_object)
. It supports several layouts, including rectangular
, slanted
and circular
for Phylogram
and Cladogram
, unrooted
layout, time-scaled and two dimentional phylogenies. Tree Visualization vignette describes these feature in details.
We implement several functions to manipulate a phylogenetic tree.
- taxa can be clustered together using
groupClade
orgroupOTU
functions - clades can be collapsed via
collapse
function - collapsed clade can be expanded by using
expand
function - clade can be re-scale to zoom in or zoom out by
scaleClade
function - selected clade can be rotated by 180 degree using
rotate
function - position of two selected clades (should share a same parent) can be exchanged by
flip
function
Details and examples can be found in Tree Manipulation vignette.
Most of the phylogenetic trees are scaled by evolutionary distance (substitution/site), in ggtree
a phylogenetic tree can be re-scaled by any numerical variable inferred by evolutionary analysis (e.g. species divergence time, dN/dS, etc). Numerical and category variable can be used to color a phylogenetic tree.
The ggtree
package provides several layers to annotate a phylogenetic tree, including:
geom_cladelabel
for labelling selected cladesgeom_hilight
for highlighting selected cladesgeom_range
to indicate uncertainty of branch lengthsgeom_strip
for adding strip/bar to label associated taxa (with optional label)geom_taxalink
for connecting related taxageom_tiplab
for adding tip labelsgeom_treescale
for adding a legend of tree scale
It supports annotating phylogenetic trees with analyses obtained from R packages and other commonly used evolutionary software. User’s specific annotation (e.g. experimental data) can be integrated to annotate phylogenetic trees. ggtree
provides write.jplace
function to combine Newick tree file and user’s own data to a single jplace
file that can be parsed and the data can be used to annotate the tree directly in ggtree
.
ggtree
integrates phylopic
database and silhouette images of organisms can be downloaded and used to annotate phylogenetic directly. ggtree
also supports using local images to annotate a phylogenetic tree.
Visualizing an annotated phylogenetic tree with numerical matrix (e.g. genotype table), multiple sequence alignment and subplots are also supported in ggtree
. Examples of annotating phylogenetic trees can be found in the Tree Annotation and Advance Tree Annotation vignettes.
Vignette Entry
- Tree Data Import
- Tree Visualization
- Tree Manipulation
- Tree Annotation
- Advance Tree Annotation
- ggtree utilities
More documents can be found in https://guangchuangyu.github.io/ggtree.
Feedback
- For bugs or feature request, please post to github issue.
- For user questions, please post to google group or post to Bioconductor support site or Biostars. We are following every post tagged with ggtree.
Session info
Here is the output of sessionInfo()
on the system on which this document was compiled:
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.5-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.5-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] colorspace_1.3-2 ggtree_1.8.2 treeio_1.0.2
## [4] ggplot2_2.2.1 Biostrings_2.44.2 XVector_0.16.0
## [7] IRanges_2.10.2 S4Vectors_0.14.3 BiocGenerics_0.22.0
## [10] ape_4.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.12 bindr_0.1 compiler_3.4.1 plyr_1.8.4
## [5] prettydoc_0.2.0 tools_3.4.1 zlibbioc_1.22.0 digest_0.6.12
## [9] jsonlite_1.5 evaluate_0.10.1 tibble_1.3.3 nlme_3.1-131
## [13] gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.1 rlang_0.1.2
## [17] rvcheck_0.0.9 yaml_2.1.14 bindrcpp_0.2 dplyr_0.7.2
## [21] stringr_1.2.0 knitr_1.17 rprojroot_1.2 grid_3.4.1
## [25] tidyselect_0.1.1 glue_1.1.1 R6_2.2.2 rmarkdown_1.6
## [29] reshape2_1.4.2 tidyr_0.7.0 purrr_0.2.3 magrittr_1.5
## [33] backports_1.1.0 scales_0.4.1 htmltools_0.3.6 assertthat_0.2.0
## [37] labeling_0.3 stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3
References
1. Wickham, H. Ggplot2: Elegant graphics for data analysis. (Springer, 2009).
2. McMurdie, P. J. & Holmes, S. Phyloseq: An r package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8, e61217 (2013).
3. Bouckaert, R. et al. BEAST 2: A software platform for bayesian evolutionary analysis. PLoS Comput Biol 10, e1003537 (2014).
4. Berger, S. A., Krompass, D. & Stamatakis, A. Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Systematic Biology 60, 291–302 (2011).
5. Pond, S. L. K., Frost, S. D. W. & Muse, S. V. HyPhy: Hypothesis testing using phylogenies. Bioinformatics 21, 676–679 (2005).
6. Yang, Z. PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24, 1586–1591 (2007).
7. Boussau, B. et al. Genome-scale coestimation of species and gene trees. Genome Res. 23, 323–330 (2013).
8. Matsen, F. A., Kodner, R. B. & Armbrust, E. V. Pplacer: Linear time maximum-likelihood and bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 538 (2010).
9. Marazzi, B. et al. Locating evolutionary precursors on a phylogenetic tree. Evolution 66, 3918–3930 (2012).
10. Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics btu033 (2014). doi:10.1093/bioinformatics/btu033
11. Höhna, S. et al. Probabilistic graphical model representation in phylogenetics. Syst Biol 63, 753–771 (2014).