sitePath 1.0.3
The sitePath
package does hierarchical search for fixation events given multiple sequence alignment and phylogenetic tree. These fixation events can be specific to a phylogenetic lineages or shared by multiple lineages. This is achieved by three major steps:
There’re various R packages for parsing phylogenetic tree and multiple sequence alignment files. For now, sitepath
accepts phylo
object and alignment
object. Functions from ggtree
and seqinr
are able to handle most file formats.
The S3 phylo class is a common data structure for phylogenetic analysis in R. The CRAN package ape provides basic parsing function for reading tree files. The Bioconductor package ggtree provides more comprehensive parsing utilities.
library(ape)
tree <- read.tree(system.file("extdata", "ZIKV.newick", package = "sitePath"))
It is highly recommended that the file stores a rooted tree as R would consider the tree is rooted by default and re-rooting the tree in R is difficult.
Most multiple sequence alignment format can be parsed by seqinr. sitePath has a wrapper function for parsing and adding the sequence alignment.
library(sitePath)
alignment_file <- system.file("extdata", "ZIKV.fasta", package = "sitePath")
tree <- addMSA(tree, alignment_file, "fasta")
The names in tree and aligment must be matched. We use a tip-to-root algorithm to trim tree leaves/tips to expose the major branches. Before finding putative phylogenetic lineages, there involves a few more steps to evalute the impact of threshold on result.
In the current version, the resolving function only takes sequence similarity as one single threshold. The impact of threshold depends on the tree topology hence there is no universal choice. The function sneakPeak
samples thresholds and calculates the resulting number of paths. The use of this function can be of great help in choosing the threshold.
preassessment <- sneakPeek(tree)
plot(preassessment$similarity, preassessment$pathNum)
Use the function lineagePath
to get a S3 sitePath class object1 The S3 sitePath object contains the nodes to represent phylogenetic lineages for downstream analysis. The choice of the threshold really depends. You can use the result from sneakPeak
as a reference for threhold choosing. Here we choose 0.996 as an example.
paths <- lineagePath(tree, 0.996)
paths
#> 6 paths
You can visualize the result.
plot(paths, no.margin = TRUE)
Now you’re ready to find fixation events.
The hierarchical search is done by fixationSites
function. The function outputs a list of mutations with the sequence names involved before and after the fixation.
fixations <- fixationSites(paths)
fixations
#> Result for 6 paths:
#>
#> 123 763 777 1902 2367 3162 139 2086 2634 894 2074 3045 1143 2842 3398
#> No reference sequence specified. Using alignment numbering
If you want to retrieve the tip names involved in the fixation of a site, you can pass the result of fixationSites
and the site index to extractTips
function. The output is a sitePath
object which stores the tip names.
sp <- extractTips(fixations, 139)
sp
#> [[1]]
#> [1] "AMD61711" "AQS26698" "APG56458" "APH11505" "APH11565" "APH11524"
#> [7] "APH11593" "AWK27349" "APH11570" "APH11525" "APH11569" "APH11541"
#> [13] "APH11579" "APH11557" "APH11594" "APH11559" "APH11563" "APH11592"
#> [19] "APH11590" "APH11530" "APH11580" "APH11538" "APH11573" "APH11536"
#> [25] "APH11493" "APH11543" "APH11529" "APH11528" "APH11566" "APH11560"
#> [31] "APH11551" "APH11571" "APH11562" "APH11595" "APH11502" "APH11497"
#> [37] "APH11503" "APH11506" "APH11578" "APH11498" "APH11500" "APH11499"
#> [43] "APH11507" "APH11546" "APH11540" "APH11532" "APH11537" "APH11553"
#> [49] "APH11511" "AOO19565" "APH11539" "APH11534" "APH11518" "APH11509"
#> [55] "APH11514" "APH11512" "APH11508" "APH11581" "APH11516" "APH11515"
#> [61] "APH11547" "APH11586" "APH11544" "APH11504" "APH11550" "APH11501"
#> [67] "APH11535" "APH11587" "APH11527" "APH11576" "APH11542" "APH11531"
#> [73] "APH11517" "APH11585" "APH11582" "APH11521" "APH11583" "APH11533"
#> [79] "APH11558" "APH11564" "APH11555" "APH11567" "APH11577" "APH11552"
#> [85] "AOL02459" "APH11568" "APH11554" "APH11574" "APH11575" "APH11572"
#> [91] "APH11545" "APH11548" "APH11561" "APH11584" "APH11523" "APH11495"
#> [97] "APH11549" "AVG19202" "APG56499" "ASW34302" "AVV62004" "BAX00477"
#> [103] "BBC70847" "AUF35022" "ATL14618" "AUF35021" "ANK57896"
#> attr(,"AA")
#> [1] "S"
#>
#> [[2]]
#> [1] "BAV89190" "AOI20067" "ANO46307" "AMM43325"
#> [5] "AVG19275" "ANN44857" "AMM43326" "ANO46306"
#> [9] "ANO46309" "ANO46305" "ANO46303" "ANO46308"
#> [13] "ARB08102" "ANO46302" "AUI42329" "AUI42330"
#> [17] "AHZ13508" "ANC90425" "AMT75536" "AOG18296"
#> [21] "AMK49165" "APY24199" "ANO46304" "ANO46301"
#> [25] "ANF16414" "AMR68932" "AOO19564" "AUI42194"
#> [29] "APC60215" "AMQ48986" "ARB07976" "AMA12086"
#> [33] "ANA12599" "AMM39806" "AMR39830" "APB03018"
#> [37] "AMH87239" "AMV49165" "AMO03410" "ATG29307"
#> [41] "ART29828" "AWF93617" "ATG29284" "ATG29287"
#> [45] "ATG29303" "AWF93619" "AWF93618" "AQM74762"
#> [49] "AUD54964" "AMC13911" "APB03019" "AMA12085"
#> [53] "AQM74761" "ATG29306" "ASL68974" "ATG29267"
#> [57] "ASL68978" "ASU55416" "ANK57897" "AWH65849"
#> [61] "AQX32985" "ATG29315" "AQZ41956" "ARI68105"
#> [65] "ASU55505" "AMZ03556" "AMU04506" "APY24198"
#> [69] "APO36913" "ALX35659" "AQZ41949" "ASL68979"
#> [73] "ATG29299" "ATI21641" "ATG29270" "ATG29291"
#> [77] "AOY08536" "ASU55417" "ANW07476" "AMA12087"
#> [81] "AMA12084" "AOG18295" "ANQ92019" "AML81028"
#> [85] "APY24200" "AMD16557" "ARU07074" "ANO46297"
#> [89] "ANO46298" "AQZ41950" "ARU07076" "ARB07967"
#> [93] "AQU12485" "AMS00611" "AOX49264" "AOX49265"
#> [97] "AOY08518" "ARB07962" "AMX81919" "AQZ41951"
#> [101] "ANF04752" "AMQ48981" "AOY08538" "AMM39805"
#> [105] "ARX97119" "AMB37295" "ARU07183" "ANG09399"
#> [109] "AQZ41954" "AOY08533" "AQZ41947" "AQZ41948"
#> [113] "AMK49164" "APG56457" "AOE22997" "APH11492"
#> [117] "AOY08517" "AOY08541" "AMK79468" "AML82110"
#> [121] "AMR39831" "AMX81918" "ANC90426" "ATG29292"
#> [125] "ATG29295" "AOW32303" "AVZ25033" "AOR82892"
#> [129] "APQ41782" "AOO54270" "AND01116" "ALU33341"
#> [133] "ASB32509" "AOC50654" "AQZ41953" "ATG29301"
#> [137] "ATG29276" "APO08503" "AMQ48982" "ARU07075"
#> [141] "ATB53752" "AMC13913" "AMC13912" "APO39243"
#> [145] "APO39229" "AQZ41952" "AQZ41955" "ART29823"
#> [149] "AMB18850" "YP_009428568" "ANH10698" "AOR82893"
#> [153] "APQ41786" "ASU55393" "APW84876" "ASK51714"
#> [157] "ARB07953" "ASU55404" "ASU55423" "ANB66182"
#> [161] "ASU55425" "APW84872" "ASU55420" "AQX32986"
#> [165] "ASU55422" "APQ41784" "ANC90428" "ASU55415"
#> [169] "ASU55418" "ARM59239" "ASU55408" "ASU55424"
#> [173] "ASU55390" "ASU55419" "ASU55391" "AOY08525"
#> [177] "APW84873" "AOY08535" "AVZ25035" "AMM39804"
#> [181] "ASU55411" "ANB66183" "ASU55421" "AMZ03557"
#> [185] "ARB07932" "AOY08523" "AOY08542" "ASW34087"
#> [189] "AOY08537" "ASU55392" "AQX32987" "ASU55403"
#> [193] "ASU55399" "APQ41783" "ANS60026" "ANB66184"
#> [197] "APB03020" "ART29826" "ART29825" "ASU55426"
#> [201] "ASU55412" "ASU55413" "ASU55410" "ASU55397"
#> [205] "ASU55400" "ASU55409" "APB03017" "ASU55395"
#> [209] "ASU55396" "AOS90220" "AMN14620" "APW84874"
#> [213] "APW84875" "AOY08524" "ASU55394" "ASU55414"
#> [217] "ASU55405" "AMC33116" "BAV82373" "ASU55406"
#> [221] "ASU55398" "ASU55407" "AMQ34003" "AMQ34004"
#> [225] "AOS90221" "AOS90224" "ASU55401" "ASU55402"
#> [229] "APB03021" "APO39232" "AOS90223" "APO39237"
#> [233] "ANH22038" "APW84877" "APO39236" "AOY08546"
#> [237] "AOY08516" "APO39233" "AOS90222" "AOO53981"
#> [241] "AOY08521" "AOO85388" "APO39228"
#> attr(,"AA")
#> [1] "N"
Use plot
to visualize each fixation mutation. You’ll have to specify which mutation to be visualized. Because the names(fixationSites)
are the mutation names, you can also use one of them as function input
plotSingleSite(fixations, 139)
This part is extra and experimental but might be useful when pre-assessing your data. We’ll use an example to demonstrate.
The plotSingleSite
function will color the tree according to amino acids if you use the output of lineagePath
function.
plotSingleSite(paths, 139)
plotSingleSite(paths, 763)
An SNP site could potentially undergo fixation event. The SNPsites
function predicts possible SNP sites and the result could be what you’ll expect to be fixation mutation.
snps <- SNPsites(tree)
plotSingleSite(paths, snps[4])
plotSingleSite(paths, snps[5])
The function groupTips
does the same on the tree as the function lineagePath
does but it outputs clusters of tree tips instead of paths. This will give you a feel of the behavior of tip-to-root trimming algorithm. The names
of the output is the internal node number in phylo
object.
grouping <- groupTips(tree)
grouping
#> $`7`
#> [1] "AMK49492"
#>
#> $`8`
#> [1] "AMX81915"
#>
#> $`186`
#> [1] "AMQ48986"
#>
#> $`198`
#> [1] "AMA12086"
#>
#> $`199`
#> [1] "AMH87239"
#>
#> $`200`
#> [1] "AMA12085"
#>
#> $`204`
#> [1] "ARU07076"
#>
#> $`205`
#> [1] "AMQ48982"
#>
#> $`206`
#> [1] "ART29823"
#>
#> $`344`
#> [1] "APY24199"
#>
#> $`345`
#> [1] "ANO46308"
#>
#> $`361`
#> [1] "AUI42289"
#>
#> $`362`
#> [1] "ANK57896"
#>
#> $`365`
#> [1] "AMD61711" "AQS26698" "APG56458"
#>
#> $`371`
#> [1] "AMD61710" "AOC50652" "APH11611"
#>
#> $`376`
#> [1] "APH11505" "APH11565" "APH11524" "APH11593" "AWK27349" "APH11570"
#> [7] "APH11525" "APH11569" "APH11541" "APH11579" "APH11557" "APH11594"
#> [13] "APH11559" "APH11563" "APH11592" "APH11590" "APH11530" "APH11580"
#> [19] "APH11538" "APH11573" "APH11536" "APH11493" "APH11543" "APH11529"
#> [25] "APH11528" "APH11566" "APH11560" "APH11551" "APH11571" "APH11562"
#> [31] "APH11595" "APH11502" "APH11497" "APH11503" "APH11506" "APH11578"
#> [37] "APH11498" "APH11500" "APH11499" "APH11507" "APH11546" "APH11540"
#> [43] "APH11532" "APH11537" "APH11553" "APH11511" "AOO19565" "APH11539"
#> [49] "APH11534" "APH11518" "APH11509" "APH11514" "APH11512" "APH11508"
#> [55] "APH11581" "APH11516" "APH11515" "APH11547" "APH11586" "APH11544"
#> [61] "APH11504" "APH11550" "APH11501" "APH11535" "APH11587" "APH11527"
#> [67] "APH11576" "APH11542" "APH11531" "APH11517" "APH11585" "APH11582"
#> [73] "APH11521" "APH11583" "APH11533" "APH11558" "APH11564" "APH11555"
#> [79] "APH11567" "APH11577" "APH11552" "AOL02459" "APH11568" "APH11554"
#> [85] "APH11574" "APH11575" "APH11572" "APH11545" "APH11548" "APH11561"
#> [91] "APH11584" "APH11523" "APH11495" "APH11549" "AVG19202" "APG56499"
#> [97] "ASW34302"
#>
#> $`473`
#> [1] "AVV62004" "BAX00477"
#>
#> $`475`
#> [1] "BAV89190" "AOI20067" "AMM43325" "ANC90425" "AMV49165" "AMO03410"
#> [7] "ANA12599" "ANF16414" "AMT75536" "AMM39806" "AMR39830" "AMR68932"
#> [13] "AUI42329" "AUI42330" "AMM43326"
#>
#> $`490`
#> [1] "AVG19275" "ANN44857" "ANO46307"
#>
#> $`493`
#> [1] "ANO46306" "ANO46309" "ANO46305" "ANO46303"
#>
#> $`500`
#> [1] "AOO19564" "AUI42194" "AOG18296"
#>
#> $`503`
#> [1] "APC60215" "ART29828" "AWF93617" "ATG29284" "AQM74761" "ATG29306"
#> [7] "ATG29287" "ATG29303" "AWF93619" "AWF93618"
#>
#> $`514`
#> [1] "AQX32985" "AQZ41949" "ANO46297" "ANO46298" "ATG29315" "ASL68979"
#> [7] "ATG29299" "AQZ41956" "ATI21641" "ARU07183" "ANG09399" "AQZ41954"
#> [13] "ATG29292" "ATG29295" "AOC50654" "AQZ41953" "AMC13913" "AMC13912"
#> [19] "APO39243" "APO39229" "AQZ41952" "AQZ41955" "ATG29301" "AOW32303"
#> [25] "AOY08533" "AVZ25033" "ATG29276" "APO08503" "AQZ41951" "AQZ41947"
#> [31] "AQZ41948" "AQZ41950" "ATG29270" "ATG29291" "AOY08536" "ASL68974"
#> [37] "ARI68105" "ASU55505" "ATG29267" "ASL68978"
#>
#> $`553`
#> [1] "ATG29307" "AQM74762" "AUD54964"
#>
#> $`556`
#> [1] "AMK49165" "ARB07976" "ASU55416" "AMZ03556" "ASU55417" "ANW07476"
#> [7] "ANK57897" "AWH65849" "AMC13911" "APB03019" "APB03018"
#>
#> $`572`
#> [1] "AMA12087" "AMA12084" "AMU04506"
#>
#> $`582`
#> [1] "APW84876" "ARB07932" "AOS90220" "AOS90221" "APO39233" "AOO53981"
#> [7] "AOY08521" "AOO85388" "APO39228" "AOS90222" "APO39236" "APB03021"
#> [13] "AOY08546" "AOY08516" "APO39232" "AOS90223" "APO39237" "APB03020"
#> [19] "AOY08523" "AOY08542" "ASW34087" "ART29826" "ART29825" "AOY08525"
#> [25] "APW84873" "AMN14620" "AOS90224" "ANH22038" "APW84877" "BAV82373"
#> [31] "APW84874" "APW84875" "AOY08537" "AOY08535" "AVZ25035" "APW84872"
#> [37] "ASK51714" "ARB07953"
#>
#> $`619`
#> [1] "ARU07075" "AMB18850" "YP_009428568"
#>
#> $`621`
#> [1] "AOR82892" "ANH10698" "AOR82893" "ATB53752"
#>
#> $`624`
#> [1] "AMK49164" "APG56457"
#>
#> $`625`
#> [1] "ANF04752" "AOE22997" "ASU55420" "ASU55426" "ASU55412" "ASU55406"
#> [7] "ASU55398" "ASU55401" "ASU55402" "ASU55407" "AOY08524" "ASU55394"
#> [13] "AMM39804" "ASU55424" "ASU55422" "AQX32986" "APQ41784" "ANC90428"
#> [19] "ASU55415" "ASU55392" "AQX32987" "ASU55403" "ASU55413" "ASU55414"
#> [25] "ASU55405" "ASU55411" "ANB66183" "ASU55421" "AMZ03557" "ASU55390"
#> [31] "ASU55418" "ASU55399" "APQ41783" "ASU55410" "ASU55397" "ANS60026"
#> [37] "ASU55400" "ASU55409" "ANB66184" "APB03017" "AMC33116" "AMQ34003"
#> [43] "AMQ34004" "ASU55395" "ASU55396" "ASU55419" "ASU55391" "ARM59239"
#> [49] "ASU55408" "APQ41786" "ASU55393" "ASU55404" "ASU55423" "ANB66182"
#> [55] "ASU55425" "APQ41782" "ARB07967"
#>
#> $`681`
#> [1] "AQU12485" "AMQ48981" "APH11492" "AOO54270" "AND01116" "AMS00611"
#> [7] "AOY08538" "AOY08517" "AOY08541"
#>
#> $`690`
#> [1] "AOG18295" "AMM39805" "AMK79468" "AML82110" "AMR39831" "AMX81918"
#> [7] "ARX97119" "ANQ92019" "AOX49264" "AOX49265" "APY24198" "APO36913"
#>
#> $`701`
#> [1] "AML81028" "APY24200" "AOY08518" "ARB07962" "AMD16557" "ALX35659"
#> [7] "AMX81919" "ANC90426" "ALU33341" "ASB32509" "AMB37295" "ARU07074"
#>
#> $`712`
#> [1] "ARB08102" "ANO46304" "ANO46301" "AHZ13508" "ANO46302"
#>
#> $`716`
#> [1] "BBC70847" "AUF35022" "ATL14618" "AUF35021"
#>
#> $`720`
#> [1] "APO08504" "AMX81917"
#>
#> $`721`
#> [1] "AVZ47169" "AMX81916"
#>
#> $`723`
#> [1] "AMR39834" "AWH65848"
sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.9-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.9-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] sitePath_1.0.3 ape_5.3 BiocStyle_2.12.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.2 bookdown_0.12 lattice_0.20-38
#> [4] digest_0.6.20 MASS_7.3-51.4 grid_3.6.1
#> [7] nlme_3.1-140 magrittr_1.5 evaluate_0.14
#> [10] stringi_1.4.3 seqinr_3.4-5 rmarkdown_1.14
#> [13] tools_3.6.1 ade4_1.7-13 stringr_1.4.0
#> [16] parallel_3.6.1 xfun_0.8 yaml_2.2.0
#> [19] compiler_3.6.1 BiocManager_1.30.4 htmltools_0.3.6
#> [22] knitr_1.23