Abstract
Intron Retention (IR) is a form of alternative splicing whereby the intron is retained (i.e. not spliced) in final messenger RNA. Although many bioinformatics tools are available to quantitate other forms of alternative splicing, dedicated tools to quantify Intron Retention are limited. Quantifying IR requires not only measurement of spliced transcripts (often using mapped splice junction reads), but also measurement of the coverage of the putative retained intron. The latter requires adjustment for the fact that many introns contain repetitive regions as well as other RNA expressing elements. IRFinder corrects for many of these complexities; however its dependencies on Linux and STAR limits its wider usage. Also, IRFinder does not calculate other forms of splicings besides IR. Finally, IRFinder produces text-based output, requiring an established understanding of the data produced in order to interpret its results. NxtIRF overcomes the above limitations. Firstly, NxtIRF incorporates the IRFinder C++ routines, allowing users to run the IRFinder algorithm in the R/Bioconductor environment on multiple platforms. NxtIRF is a full pipeline that quantifies IR (and other alternative splicing) events, organises the data and produces relevant visualisation. Additionally, NxtIRF offers an interactive graphical interface that allows users to explore the data. NxtIRFdata is a data package containing ready-made BED files of Mappability exclusion genomic regions. It also contains a fully-functioning example data set with a “mock” genome and genome annotation to demonstrate the functionalities of NxtIRFTo install this package, start R (version “4.1”) and enter:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("NxtIRFdata")
Start using NxtIRFdata:
Examples in NxtIRF are demonstrated using an artificial genome and gene annotation. A synthetic reference, with genome sequence (FASTA) and gene annotation (GTF) files are provided, based on the genes SRSF1, SRSF2, SRSF3, TRA2A, TRA2B, TP53 and NSUN5. These genes, each with an additional 100 flanking nucleotides, were used to construct an artificial “chromosome Z” (chrZ). Gene annotations, based on release-94 of Ensembl GRCh38 (hg38), were modified with genome coordinates corresponding to this artificial chromosome.
These files can be accessed as follows:
The set of 6 BAM files used in the NxtIRF vignette / example code can be downloaded to a path of the user’s choice using the following function:
Note that this downloads BAM files and not their respective BAI (BAM file indices). This is because NxtIRF reads BAM files natively and does not require RSamtools. BAI files are provided with BAM files in their respective ExperimentHub entries for users wishing to view these files using RSamtools.
NxtIRFdata retrieves the relevant records from AnnotationHub and makes a local copy of the BED file. This BED file is used to produce a genome reference for NxtIRF.
Note that this function is intended to be called internally by NxtIRF. Users interested in the format or nature of the Mappability BED file can call this function to examine the contents of the BED file
# To get the MappabilityExclusion for hg38 as a GRanges object
gr = get_mappability_exclusion(genome_type = "hg38", as_type = "GRanges")
# To get the MappabilityExclusion for hg38 as a locally-copied gzipped BED file
bed_path = get_mappability_exclusion(genome_type = "hg38", as_type = "bed.gz",
path = tempdir())
# Other `genome_type` values include "hg19", "mm10", and "mm9"
The data deposited in ExperimentHub can be accessed as follows:
For more information about the example BAM files, refer to the NxtIRFdata package documentation:
## R version 4.2.0 RC (2022-04-19 r82224)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ExperimentHub_2.4.0 AnnotationHub_3.4.0 BiocFileCache_2.4.0
## [4] dbplyr_2.1.1 BiocGenerics_0.42.0 NxtIRFdata_1.2.0
##
## loaded via a namespace (and not attached):
## [1] MatrixGenerics_1.8.0 Biobase_2.56.0
## [3] httr_1.4.2 sass_0.4.1
## [5] bit64_4.0.5 jsonlite_1.8.0
## [7] R.utils_2.11.0 bslib_0.3.1
## [9] shiny_1.7.1 assertthat_0.2.1
## [11] interactiveDisplayBase_1.34.0 BiocManager_1.30.17
## [13] stats4_4.2.0 blob_1.2.3
## [15] GenomeInfoDbData_1.2.8 Rsamtools_2.12.0
## [17] yaml_2.3.5 BiocVersion_3.15.2
## [19] pillar_1.7.0 RSQLite_2.2.12
## [21] lattice_0.20-45 glue_1.6.2
## [23] digest_0.6.29 promises_1.2.0.1
## [25] GenomicRanges_1.48.0 XVector_0.36.0
## [27] httpuv_1.6.5 htmltools_0.5.2
## [29] Matrix_1.4-1 R.oo_1.24.0
## [31] XML_3.99-0.9 pkgconfig_2.0.3
## [33] zlibbioc_1.42.0 xtable_1.8-4
## [35] purrr_0.3.4 later_1.3.0
## [37] BiocParallel_1.30.0 tibble_3.1.6
## [39] KEGGREST_1.36.0 generics_0.1.2
## [41] IRanges_2.30.0 ellipsis_0.3.2
## [43] cachem_1.0.6 withr_2.5.0
## [45] SummarizedExperiment_1.26.0 cli_3.3.0
## [47] mime_0.12 magrittr_2.0.3
## [49] crayon_1.5.1 memoise_2.0.1
## [51] evaluate_0.15 R.methodsS3_1.8.1
## [53] fansi_1.0.3 tools_4.2.0
## [55] BiocIO_1.6.0 lifecycle_1.0.1
## [57] matrixStats_0.62.0 stringr_1.4.0
## [59] S4Vectors_0.34.0 DelayedArray_0.22.0
## [61] AnnotationDbi_1.58.0 Biostrings_2.64.0
## [63] compiler_4.2.0 jquerylib_0.1.4
## [65] GenomeInfoDb_1.32.0 rlang_1.0.2
## [67] grid_4.2.0 RCurl_1.98-1.6
## [69] rjson_0.2.21 rappdirs_0.3.3
## [71] bitops_1.0-7 rmarkdown_2.14
## [73] restfulr_0.0.13 DBI_1.1.2
## [75] curl_4.3.2 R6_2.5.1
## [77] GenomicAlignments_1.32.0 knitr_1.39
## [79] dplyr_1.0.8 rtracklayer_1.56.0
## [81] fastmap_1.1.0 bit_4.0.4
## [83] utf8_1.2.2 filelock_1.0.2
## [85] stringi_1.7.6 parallel_4.2.0
## [87] Rcpp_1.0.8.3 png_0.1-7
## [89] vctrs_0.4.1 tidyselect_1.1.2
## [91] xfun_0.30