In this vignette, we provide an overview of the basic functionality and usage of the scds
package, which interfaces with SingleCellExperiment
objects.
Install the scds
package using Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("scds", version = "3.9")
Or from github:
library(devtools)
devtools::install_github('kostkalab/scds')
scds
takes as input a SingleCellExperiment
object (see here SingleCellExperiment), where raw counts are stored in a counts
assay, i.e. assay(sce,"counts")
. An example dataset created by sub-sampling the cell-hashing cell-lines data set (see https://satijalab.org/seurat/hashing_vignette.html) is included with the package and accessible via data("sce")
.Note that scds
is designed to workd with larger datasets, but for the purposes of this vignette, we work with a smaller example dataset. We apply scds
to this data and compare/visualize reasults:
Get example data set provided with the package.
library(scds)
library(scater)
library(rsvd)
library(Rtsne)
library(cowplot)
set.seed(30519)
data("sce_chcl")
sce = sce_chcl #- less typing
dim(sce)
## [1] 2000 2000
We see it contains 2,000 genes and 2,000 cells, 216 of which are identified as doublets:
table(sce$hto_classification_global)
##
## Doublet Negative Singlet
## 216 83 1701
We can visualize cells/doublets after projecting into two dimensions:
logcounts(sce) = log1p(counts(sce))
vrs = apply(logcounts(sce),1,var)
pc = rpca(t(logcounts(sce)[order(vrs,decreasing=TRUE)[1:100],]))
ts = Rtsne(pc$x[,1:10],verb=FALSE)
reducedDim(sce,"tsne") = ts$Y; rm(ts,vrs,pc)
plotReducedDim(sce,"tsne",col="hto_classification_global")
We now run the scds
doublet annotation approaches. Briefly, we identify doublets in two complementary ways: cxds
is based on co-expression of gene pairs and works with absence/presence calls only, while bcds
uses the full count information and a binary classification approach using artificially generated doublets. cxds_bcds_hybrid
combines both approaches, for more details please consult (this manuscript). Each of the three methods returns a doublet score, with higher scores indicating more “doublet-like” barcodes.
#- Annotate doublet using co-expression based doublet scoring:
sce = cxds(sce,retRes = TRUE)
sce = bcds(sce,retRes = TRUE,verb=TRUE)
## [18:41:49] WARNING: amalgamation/../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
sce = cxds_bcds_hybrid(sce)
par(mfcol=c(1,3))
boxplot(sce$cxds_score ~ sce$doublet_true_labels, main="cxds")
boxplot(sce$bcds_score ~ sce$doublet_true_labels, main="bcds")
boxplot(sce$hybrid_score ~ sce$doublet_true_labels, main="hybrid")
For cxds
we can identify and visualize gene pairs driving doublet annoataions, with the expectation that the two genes in a pair might mark different types of cells (see manuscript). In the following we look at the top three pairs, each gene pair is a row in the plot below:
scds =
top3 = metadata(sce)$cxds$topPairs[1:3,]
rs = rownames(sce)
hb = rowData(sce)$cxds_hvg_bool
ho = rowData(sce)$cxds_hvg_ordr[hb]
hgs = rs[ho]
l1 = ggdraw() + draw_text("Pair 1", x = 0.5, y = 0.5)
p1 = plotReducedDim(sce,"tsne",col=hgs[top3[1,1]])
p2 = plotReducedDim(sce,"tsne",col=hgs[top3[1,2]])
l2 = ggdraw() + draw_text("Pair 2", x = 0.5, y = 0.5)
p3 = plotReducedDim(sce,"tsne",col=hgs[top3[2,1]])
p4 = plotReducedDim(sce,"tsne",col=hgs[top3[2,2]])
l3 = ggdraw() + draw_text("Pair 3", x = 0.5, y = 0.5)
p5 = plotReducedDim(sce,"tsne",col=hgs[top3[3,1]])
p6 = plotReducedDim(sce,"tsne",col=hgs[top3[3,2]])
plot_grid(l1,p1,p2,l2,p3,p4,l3,p5,p6,ncol=3, rel_widths = c(1,2,2))
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] cowplot_1.1.1 Rtsne_0.15
## [3] rsvd_1.0.5 scater_1.22.0
## [5] ggplot2_3.3.5 scuttle_1.4.0
## [7] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
## [9] Biobase_2.54.0 GenomicRanges_1.46.0
## [11] GenomeInfoDb_1.30.0 IRanges_2.28.0
## [13] S4Vectors_0.32.0 BiocGenerics_0.40.0
## [15] MatrixGenerics_1.6.0 matrixStats_0.61.0
## [17] scds_1.10.0 BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-7 tools_4.1.1
## [3] bslib_0.3.1 utf8_1.2.2
## [5] R6_2.5.1 irlba_2.3.3
## [7] vipor_0.4.5 DBI_1.1.1
## [9] colorspace_2.0-2 withr_2.4.2
## [11] tidyselect_1.1.1 gridExtra_2.3
## [13] compiler_4.1.1 BiocNeighbors_1.12.0
## [15] DelayedArray_0.20.0 labeling_0.4.2
## [17] bookdown_0.24 sass_0.4.0
## [19] scales_1.1.1 stringr_1.4.0
## [21] digest_0.6.28 rmarkdown_2.11
## [23] XVector_0.34.0 pkgconfig_2.0.3
## [25] htmltools_0.5.2 sparseMatrixStats_1.6.0
## [27] fastmap_1.1.0 highr_0.9
## [29] rlang_0.4.12 DelayedMatrixStats_1.16.0
## [31] jquerylib_0.1.4 generics_0.1.1
## [33] farver_2.1.0 jsonlite_1.7.2
## [35] BiocParallel_1.28.0 dplyr_1.0.7
## [37] RCurl_1.98-1.5 magrittr_2.0.1
## [39] BiocSingular_1.10.0 GenomeInfoDbData_1.2.7
## [41] Matrix_1.3-4 Rcpp_1.0.7
## [43] ggbeeswarm_0.6.0 munsell_0.5.0
## [45] fansi_0.5.0 viridis_0.6.2
## [47] lifecycle_1.0.1 stringi_1.7.5
## [49] pROC_1.18.0 yaml_2.2.1
## [51] zlibbioc_1.40.0 plyr_1.8.6
## [53] grid_4.1.1 parallel_4.1.1
## [55] ggrepel_0.9.1 crayon_1.4.1
## [57] lattice_0.20-45 beachmat_2.10.0
## [59] magick_2.7.3 knitr_1.36
## [61] pillar_1.6.4 xgboost_1.4.1.1
## [63] ScaledMatrix_1.2.0 glue_1.4.2
## [65] evaluate_0.14 data.table_1.14.2
## [67] BiocManager_1.30.16 vctrs_0.3.8
## [69] gtable_0.3.0 purrr_0.3.4
## [71] assertthat_0.2.1 xfun_0.27
## [73] viridisLite_0.4.0 tibble_3.1.5
## [75] beeswarm_0.4.0 ellipsis_0.3.2