SEtools 1.8.0
The SEtools package is a set of convenience functions for the Bioconductor class SummarizedExperiment. It facilitates merging, melting, and plotting SummarizedExperiment
objects.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SEtools")
Or, to install the latest development version:
BiocManager::install("plger/SEtools")
To showcase the main functions, we will use an example object which contains (a subset of) whole-hippocampus RNAseq of mice after different stressors:
suppressPackageStartupMessages({
library(SummarizedExperiment)
library(SEtools)
})
data("SE", package="SEtools")
SE
## class: SummarizedExperiment
## dim: 100 20
## metadata(0):
## assays(2): counts logcpm
## rownames(100): Egr1 Nr4a1 ... CH36-200G6.4 Bhlhe22
## rowData names(2): meanCPM meanTPM
## colnames(20): HC.Homecage.1 HC.Homecage.2 ... HC.Swim.4 HC.Swim.5
## colData names(2): Region Condition
This is taken from Floriou-Servou et al., Biol Psychiatry 2018.
There are two main wrappers for plotting heatmaps from SummarizedExperiment
objects:
sehm
function uses the pheatmap enginesechm
function uses the ComplexHeatmap engineBoth functions were made to function very similarly, but the sechm
function is especially useful to combine heatmaps (for instance, from different SummarizedExperiment
objects). We’ll showcase sehm
(the main functionalities being replicable with sechm
), and will then provide examples of multiple heatmaps.
The sehm
function simplifies the generation of heatmaps from SummarizedExperiment
. It uses pheatmap, so any argument supported by it can in principle be passed:
g <- c("Egr1", "Nr4a1", "Fos", "Egr2", "Sgk1", "Arc", "Dusp1", "Fosb", "Sik1")
sehm(SE, genes=g)
## Warning: 'sehm' is deprecated; please use sechm::sechm instead
## Using assay logcpm
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
sehm(SE, assayName="logcpm", genes=g, do.scale=TRUE)
## Warning: 'sehm' is deprecated; please use sechm::sechm instead
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
When scaling data, the function will automatically center the colour scale around zero, and handle the extreme values (0.5% percentile on each side) in a non-linear fashion to retain a useful visualization.
This behavior can be manually controlled via the breaks
parameter (either setting it to FALSE, to a percentile until which the scale should be linear, of manually inputting breaks).
Annotation from the object’s rowData
and colData
can be plotted simply by specifying the column name (some will be shown by default if found):
sehm(SE, assayName="logcpm", genes=g, do.scale=TRUE, anno_rows="meanTPM")
## Warning: 'sehm' is deprecated; please use sechm::sechm instead
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
These can also be used to create gaps:
sehm(SE, genes=g, do.scale=TRUE, anno_rows="meanTPM", gaps_at="Condition")
## Warning: 'sehm' is deprecated; please use sechm::sechm instead
## Using assay logcpm
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
The specific assay to use for plotting can be specified with the assayName
argument.
By default, rows are sorted not with hierarchical clustering, but from the angle on a MDS plot, which tends to give nicer results than bottom-up hierarchical clustering. This can be disabled using sortRowsOn=NULL
or cluster_rows=TRUE
(to avoid any row reordering and use the order given, use sortRowsOn=NULL, cluster_rows=FALSE
). Column clustering is disabled by default, but this can be changed with cluster_cols=TRUE
.
It is common to cluster features into groups, and such a clustering can be used simultaneously with row sorting using the toporder
argument. For instance:
lfcs <- assays(SE)$logcpm-rowMeans(assays(SE)$logcpm[,which(SE$Condition=="Homecage")])
rowData(SE)$cluster <- as.character(kmeans(lfcs,4)$cluster)
sehm(SE, genes=g, do.scale=TRUE, anno_rows="cluster", toporder="cluster", gaps_at="Condition")
## Warning: 'sehm' is deprecated; please use sechm::sechm instead
## Using assay logcpm
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
For some arguments (for instance colors), if they are not specified in the function call, SEtools
will try to see whether the object itself contains it, or whether the corresponding global options have been set, before using default colors. This means that if, in the context of a given project, the same colors are repeatedly being used, they can be specified a single time, and all subsequent plots will be affected.
Storing colors in the object:
metadata(SE)$hmcols <- c("purple","white","gold")
# or something fancier, like:
# metadata(SE)$hmcols <- colorspace::diverging_hcl(palette="Berlin", n=101)
ancols <- list( Condition=c( Homecage="#DB918B",
Handling="#B86FD3",
Restraint="#A9CED5",
Swim="#B5DF7C" ) )
metadata(SE)$anno_colors <- ancols
sehm(SE, g, do.scale = TRUE)
## Warning: 'sehm' is deprecated; please use sechm::sechm instead
## Using assay logcpm
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
Using the global options:
options("SEtools_def_hmcols"=c("white","grey","black"))
options("SEtools_def_anno_colors"=ancols)
sehm(SE, g, do.scale = TRUE)
## Warning: 'sehm' is deprecated; please use sechm::sechm instead
## Using assay logcpm
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
At the moment, the following arguments can be set as global options:
assayName
, hmcols
, anno_columns
, anno_rows
, anno_colors
, gaps_at
, breaks
.
Options must be set with the prefix SEtools_def_
, followed by the name of the argument.
To remove the predefined colors:
resetAllSEtoolsOptions()
metadata(SE)$hmcols <- NULL
metadata(SE)$anno_colors <- NULL
In order of priority, the arguments in the function call trump the object’s metadata, which trumps the global options.
The sechm
function works like the sehm
function, but the fact that it outputs a Heatmap
object from ComplexHeatmap means that these can be easily combined:
sechm(SE, g, do.scale = TRUE) + sechm(SE, g, do.scale = FALSE)
## Warning: `sechm` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sechm`) instead
## Using assay logcpm
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
## Warning: `sechm` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sechm`) instead
## Using assay logcpm
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
## Warning: Row names of heatmap 2 are not consistent with the main heatmap (1). It
## may lead to wrong conclusion of your data. Please double check.
However, doing so involves manual work to ensure that the labels and colors are nice and coherent, and that the rows names match. As a convenience, we provide the crossHm
function to handle these issues. crossHm
works with a list of SummarizedExperiment
objects:
# we build another SE object:
SE2 <- SE
assays(SE2)$logcpm <- jitter(assays(SE2)$logcpm, factor=1000)
crossHm(list(SE1=SE, SE2=SE2), g, do.scale = TRUE)
## Warning: `crossHm` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::crossHm`) instead
## Using assay logcpm
## Using assay logcpm
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
## Warning: `sechm` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sechm`) instead
## Warning: `sechm` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sechm`) instead
A unique color scale can be enforced:
crossHm(list(SE1=SE, SE2=SE2), g, do.scale = TRUE, uniqueScale = TRUE)
## Warning: `crossHm` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::crossHm`) instead
## Using assay logcpm
## Using assay logcpm
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
## Warning: `sechm` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sechm`) instead
## Warning: `sechm` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sechm`) instead
se1 <- SE[,1:10]
se2 <- SE[,11:20]
se3 <- mergeSEs( list(se1=se1, se2=se2) )
se3
## class: SummarizedExperiment
## dim: 100 20
## metadata(3): se1 se2 anno_colors
## assays(2): counts logcpm
## rownames(100): AC139063.2 Actr6 ... Zfp667 Zfp930
## rowData names(3): meanCPM meanTPM cluster
## colnames(20): se1.HC.Homecage.1 se1.HC.Homecage.2 ... se2.HC.Swim.4
## se2.HC.Swim.5
## colData names(3): Dataset Region Condition
All assays were merged, along with rowData and colData slots.
By default, row z-scores are calculated for each object when merging. This can be prevented with:
se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE)
If more than one assay is present, one can specify a different scaling behavior for each assay:
se3 <- mergeSEs( list(se1=se1, se2=se2), use.assays=c("counts", "logcpm"), do.scale=c(FALSE, TRUE))
It is also possible to merge by rowData columns, which are specified through the mergeBy
argument.
In this case, one can have one-to-many and many-to-many mappings, in which case two behaviors are possible:
aggFun
, the features of each object will by aggregated by mergeBy
using this function before merging.rowData(se1)$metafeature <- sample(LETTERS,nrow(se1),replace = TRUE)
rowData(se2)$metafeature <- sample(LETTERS,nrow(se2),replace = TRUE)
se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE, mergeBy="metafeature", aggFun=median)
## Aggregating the objects by metafeature
## Merging...
sehm(se3, genes=row.names(se3))
## Warning: 'sehm' is deprecated; please use sechm::sechm instead
## Using assay logcpm
## Warning: `sortRows` has been moved and will eventually be removed from this package.
## Please use the version from the 'sechm' package (`sechm::sortRows`) instead
A single SE can also be aggregated by using the aggSE
function:
se1b <- aggSE(se1, by = "metafeature")
## Aggregation methods for each assay:
## counts: sum; logcpm: expsum
se1b
## class: SummarizedExperiment
## dim: 26 10
## metadata(0):
## assays(2): counts logcpm
## rownames(26): A B ... Y Z
## rowData names(0):
## colnames(10): HC.Homecage.1 HC.Homecage.2 ... HC.Handling.4
## HC.Handling.5
## colData names(2): Region Condition
If the aggregation function(s) are not specified, aggSE
will try to guess decent aggregation functions from the assay names.
To facilitate plotting features with ggplot2, the meltSE
function combines assay values along with row/column data:
d <- meltSE(SE, genes=g[1:4])
head(d)
## feature sample Region Condition counts logcpm
## 1 Egr1 HC.Homecage.1 HC Homecage 1581.0 4.4284969
## 2 Nr4a1 HC.Homecage.1 HC Homecage 750.0 3.6958917
## 3 Fos HC.Homecage.1 HC Homecage 91.4 1.7556317
## 4 Egr2 HC.Homecage.1 HC Homecage 15.1 0.5826999
## 5 Egr1 HC.Homecage.2 HC Homecage 1423.0 4.4415828
## 6 Nr4a1 HC.Homecage.2 HC Homecage 841.0 3.9237691
suppressPackageStartupMessages(library(ggplot2))
ggplot(d, aes(Condition, counts, fill=Condition)) + geom_violin() +
facet_wrap(~feature, scale="free")
Calculate an assay of log-foldchanges to the controls:
SE <- log2FC(SE, fromAssay="logcpm", controls=SE$Condition=="Homecage")
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggplot2_3.3.5 SEtools_1.8.0
## [3] SummarizedExperiment_1.24.0 Biobase_2.54.0
## [5] GenomicRanges_1.46.0 GenomeInfoDb_1.30.0
## [7] IRanges_2.28.0 S4Vectors_0.32.0
## [9] BiocGenerics_0.40.0 MatrixGenerics_1.6.0
## [11] matrixStats_0.61.0 BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.15 colorspace_2.0-2 rjson_0.2.20
## [4] ellipsis_0.3.2 circlize_0.4.13 XVector_0.34.0
## [7] GlobalOptions_0.1.2 clue_0.3-60 farver_2.1.0
## [10] bit64_4.0.5 AnnotationDbi_1.56.0 fansi_0.5.0
## [13] codetools_0.2-18 splines_4.1.1 doParallel_1.0.16
## [16] cachem_1.0.6 geneplotter_1.72.0 knitr_1.36
## [19] jsonlite_1.7.2 annotate_1.72.0 cluster_2.1.2
## [22] png_0.1-7 pheatmap_1.0.12 BiocManager_1.30.16
## [25] compiler_4.1.1 httr_1.4.2 assertthat_0.2.1
## [28] Matrix_1.3-4 fastmap_1.1.0 limma_3.50.0
## [31] htmltools_0.5.2 tools_4.1.1 gtable_0.3.0
## [34] glue_1.4.2 GenomeInfoDbData_1.2.7 dplyr_1.0.7
## [37] V8_3.4.2 Rcpp_1.0.7 jquerylib_0.1.4
## [40] vctrs_0.3.8 Biostrings_2.62.0 nlme_3.1-153
## [43] iterators_1.0.13 xfun_0.27 stringr_1.4.0
## [46] openxlsx_4.2.4 lifecycle_1.0.1 XML_3.99-0.8
## [49] edgeR_3.36.0 zlibbioc_1.40.0 scales_1.1.1
## [52] TSP_1.1-11 parallel_4.1.1 RColorBrewer_1.1-2
## [55] ComplexHeatmap_2.10.0 yaml_2.2.1 curl_4.3.2
## [58] memoise_2.0.0 sass_0.4.0 stringi_1.7.5
## [61] RSQLite_2.2.8 highr_0.9 randomcoloR_1.1.0.1
## [64] genefilter_1.76.0 foreach_1.5.1 seriation_1.3.1
## [67] zip_2.2.0 BiocParallel_1.28.0 shape_1.4.6
## [70] rlang_0.4.12 pkgconfig_2.0.3 bitops_1.0-7
## [73] evaluate_0.14 lattice_0.20-45 purrr_0.3.4
## [76] labeling_0.4.2 bit_4.0.4 tidyselect_1.1.1
## [79] magrittr_2.0.1 bookdown_0.24 DESeq2_1.34.0
## [82] R6_2.5.1 magick_2.7.3 generics_0.1.1
## [85] DelayedArray_0.20.0 DBI_1.1.1 withr_2.4.2
## [88] mgcv_1.8-38 pillar_1.6.4 survival_3.2-13
## [91] KEGGREST_1.34.0 RCurl_1.98-1.5 tibble_3.1.5
## [94] crayon_1.4.1 utf8_1.2.2 rmarkdown_2.11
## [97] GetoptLong_1.0.5 locfit_1.5-9.4 grid_4.1.1
## [100] sva_3.42.0 data.table_1.14.2 blob_1.2.2
## [103] digest_0.6.28 xtable_1.8-4 munsell_0.5.0
## [106] registry_0.5-1 bslib_0.3.1