crisprBowtie
can be installed from Bioconductor using the following
commands in a fresh R session:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("crisprBowtie")
crisprBowtie
provides two main functions to align short DNA sequences to
a reference genome using the short read aligner bowtie (Langmead et al. 2009)
and return the alignments as R objects: runBowtie
and runCrisprBowtie
.
It utilizes the Bioconductor package Rbowtie
to access the bowtie program
in a platform-independent manner. This means that users do not need to install
bowtie prior to using crisprBowtie
.
The latter function (runCrisprBowtie
) is specifically designed
to map and annotate CRISPR guide RNA (gRNA) spacer sequences using
CRISPR nuclease objects and CRISPR genomic arithmetics defined in
the Bioconductor crisprBase
package. This enables a fast and accurate
on-target and off-target search of gRNA spacer sequences for virtually any
type of CRISPR nucleases.
To use runBowtie
or runCrisprBowtie
, users need to first build a bowtie
genome index. For a given genome, this step has to be done only once.
The Rbowtie
package convenitenly provides the function bowtie_build
to build a bowtie index from any custom genome from a FASTA file.
As an example, we build a bowtie index for a small portion of the human
chromosome 1 (chr1.fa
file provided in the crisprBowtie
package) and
save the index file as myIndex
to a temporary directory:
library(Rbowtie)
fasta <- file.path(find.package("crisprBowtie"), "example/chr1.fa")
tempDir <- tempdir()
Rbowtie::bowtie_build(fasta,
outdir=tempDir,
force=TRUE,
prefix="myIndex")
runCrisprBowtie
As an example, we align 6 spacer sequences (of length 20bp) to the custom genome built above, allowing a maximum of 3 mismatches between the spacer and protospacer sequences.
We specify that the search is for the wildtype Cas9 (SpCas9) nuclease
by providing the CrisprNuclease
object SpCas9
available through the
crisprBase
package. The argument canonical=FALSE
specifies that
non-canonical PAM sequences are also considered (NAG and NGA for SpCas9).
The function getAvailableCrisprNucleases
in crisprBase
returns a character
vector of available crisprNuclease
objects found in crisprBase
.
library(crisprBowtie)
data(SpCas9, package="crisprBase")
crisprNuclease <- SpCas9
spacers <- c("TCCGCGGGCGACAATGGCAT",
"TGATCCCGCGCTCCCCGATG",
"CCGGGAGCCGGGGCTGGACG",
"CCACCCTCAGGTGTGCGGCC",
"CGGAGGGCTGCAGAAAGCCT",
"GGTGATGGCGCGGGCCGGGC")
runCrisprBowtie(spacers,
crisprNuclease=crisprNuclease,
n_mismatches=3,
canonical=FALSE,
bowtie_index=file.path(tempDir, "myIndex"))
## [runCrisprBowtie] Searching for SpCas9 protospacers
## spacer protospacer pam chr pam_site strand
## 1 CCACCCTCAGGTGTGCGGCC CCACCCTCAGGTGTGCGGCC TGG chr1 679 +
## 2 CCGGGAGCCGGGGCTGGACG CCGGGAGCCGGGGCTGGACG GAG chr1 466 +
## 3 CGGAGGGCTGCAGAAAGCCT CGGAGGGCTGCAGAAAGCCT TGG chr1 706 +
## 4 GGTGATGGCGCGGGCCGGGC GGTGATGGCGCGGGCCGGGC CGG chr1 831 +
## 5 TGATCCCGCGCTCCCCGATG TGATCCCGCGCTCCCCGATG CAG chr1 341 +
## n_mismatches canonical
## 1 0 TRUE
## 2 0 FALSE
## 3 0 TRUE
## 4 0 TRUE
## 5 0 FALSE
The function runBowtie
is similar to runCrisprBowtie
,
but does not impose constraints on PAM sequences.
It can be used to search for any short read sequence in a genome.
Seed-related off-targets caused by mismatch tolerance outside of the
seed region is a well-studied and characterized problem observed in RNA
interference (RNA) experiments. runBowtie
can be used to map shRNA/siRNA seed
sequences to reference genomes to predict putative off-targets:
seeds <- c("GTAAAGGT", "AAGGATTG")
runBowtie(seeds,
n_mismatches=2,
bowtie_index=file.path(tempDir, "myIndex"))
## query target chr pos strand n_mismatches
## 1 AAGGATTG AAAGAATG chr1 163 - 2
## 2 AAGGATTG AAGCCTTG chr1 700 + 2
## 3 AAGGATTG AAGGCTTT chr1 699 - 2
## 4 AAGGATTG CAGGCTTG chr1 905 - 2
## 5 GTAAAGGT GGGAAGGT chr1 724 + 2
sessionInfo()
## R version 4.2.0 RC (2022-04-19 r82224)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] crisprBowtie_1.0.0 Rbowtie_1.36.0 BiocStyle_2.24.0
##
## loaded via a namespace (and not attached):
## [1] lattice_0.20-45 Rsamtools_2.12.0
## [3] Biostrings_2.64.0 digest_0.6.29
## [5] utf8_1.2.2 R6_2.5.1
## [7] GenomeInfoDb_1.32.0 stats4_4.2.0
## [9] evaluate_0.15 pillar_1.7.0
## [11] zlibbioc_1.42.0 rlang_1.0.2
## [13] jquerylib_0.1.4 S4Vectors_0.34.0
## [15] Matrix_1.4-1 rmarkdown_2.14
## [17] BiocParallel_1.30.0 readr_2.1.2
## [19] stringr_1.4.0 RCurl_1.98-1.6
## [21] bit_4.0.4 DelayedArray_0.22.0
## [23] compiler_4.2.0 rtracklayer_1.56.0
## [25] xfun_0.30 pkgconfig_2.0.3
## [27] BiocGenerics_0.42.0 htmltools_0.5.2
## [29] tidyselect_1.1.2 SummarizedExperiment_1.26.0
## [31] tibble_3.1.6 GenomeInfoDbData_1.2.8
## [33] bookdown_0.26 IRanges_2.30.0
## [35] matrixStats_0.62.0 XML_3.99-0.9
## [37] fansi_1.0.3 crayon_1.5.1
## [39] tzdb_0.3.0 GenomicAlignments_1.32.0
## [41] bitops_1.0-7 grid_4.2.0
## [43] jsonlite_1.8.0 lifecycle_1.0.1
## [45] magrittr_2.0.3 cli_3.3.0
## [47] stringi_1.7.6 vroom_1.5.7
## [49] XVector_0.36.0 crisprBase_1.0.0
## [51] bslib_0.3.1 ellipsis_0.3.2
## [53] vctrs_0.4.1 rjson_0.2.21
## [55] restfulr_0.0.13 tools_4.2.0
## [57] bit64_4.0.5 BSgenome_1.64.0
## [59] Biobase_2.56.0 glue_1.6.2
## [61] purrr_0.3.4 hms_1.1.1
## [63] MatrixGenerics_1.8.0 parallel_4.2.0
## [65] fastmap_1.1.0 yaml_2.3.5
## [67] BiocManager_1.30.17 GenomicRanges_1.48.0
## [69] knitr_1.38 sass_0.4.1
## [71] BiocIO_1.6.0
Langmead, Ben, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. 2009. “Ultrafast and Memory-Efficient Alignment of Short Dna Sequences to the Human Genome.” Genome Biology 10 (3): R25. https://doi.org/10.1186/gb-2009-10-3-r25.