if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("Dune")
We use a subset of the Allen Smart-Seq nuclei dataset. Run ?Dune::nuclei
for more details on pre-processing.
suppressPackageStartupMessages({
library(RColorBrewer)
library(dplyr)
library(ggplot2)
library(tidyr)
library(knitr)
library(purrr)
library(Dune)
})
data("nuclei", package = "Dune")
theme_set(theme_classic())
We have a dataset of \(1744\) cells, with the results from 3 clustering algorithms: Seurat3, Monocle3 and SC3. The Allen Institute also produce hand-picked cluster and subclass labels. Finally, we included the coordinates from a t-SNE representation, for visualization.
ggplot(nuclei, aes(x = x, y = y, col = subclass_label)) +
geom_point()
We can also see how the three clustering algorithm partitioned the dataset initially:
walk(c("SC3", "Seurat", "Monocle"), function(clus_algo){
df <- nuclei
df$clus_algo <- nuclei[, clus_algo]
p <- ggplot(df, aes(x = x, y = y, col = as.character(clus_algo))) +
geom_point(size = 1.5) +
# guides(color = FALSE) +
labs(title = clus_algo, col = "clusters") +
theme(legend.position = "bottom")
print(p)
})
The adjusted Rand Index between the three methods can be computed.
plotARIs(nuclei %>% select(SC3, Seurat, Monocle))
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
As we can see, the ARI between the three methods is initially quite low.
We can now try to merge clusters with the Dune
function. At each step, the algorithm will print which clustering label is merged (by its number, so 1~SC3
and so on), as well as the pair of clusters that get merged.
merger <- Dune(clusMat = nuclei %>% select(SC3, Seurat, Monocle), verbose = TRUE)
## [1] "SC3" "21" "20"
## [1] "Monocle" "20" "4"
## [1] "SC3" "11" "12"
## [1] "SC3" "30" "28"
## [1] "SC3" "11" "24"
The output from Dune
is a list with four components:
names(merger)
## [1] "initialMat" "currentMat" "merges" "ImpMetric" "metric"
initialMat
is the initial matrix. of cluster labels. currentMat
is the final matrix of cluster labels. merges
is a matrix that recapitulates what has been printed above, while ImpARI
list the ARI improvement over the merges.
We can now see how much the ARI has improved:
plotARIs(clusMat = merger$currentMat)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
The methods now look much more similar, as can be expected.
We can also see how the number of clusters got reduced.
plotPrePost(merger)
For SC3 for example, we can visualize how the clusters got merged:
ConfusionPlot(merger$initialMat[, "SC3"], merger$currentMat[, "SC3"]) +
labs(x = "Before merging", y = "After merging")
Finally, the ARIImp function tracks mean ARI improvement as pairs of clusters get merged down.
ARItrend(merger)
sessionInfo()
## R version 4.2.0 RC (2022-04-19 r82224)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Dune_1.8.0 purrr_0.3.4 knitr_1.38 tidyr_1.2.0
## [5] ggplot2_3.3.5 dplyr_1.0.8 RColorBrewer_1.1-3
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.8.3 gganimate_1.0.7
## [3] lattice_0.20-45 prettyunits_1.1.1
## [5] assertthat_0.2.1 digest_0.6.29
## [7] utf8_1.2.2 R6_2.5.1
## [9] GenomeInfoDb_1.32.0 stats4_4.2.0
## [11] evaluate_0.15 highr_0.9
## [13] pillar_1.7.0 zlibbioc_1.42.0
## [15] rlang_1.0.2 progress_1.2.2
## [17] jquerylib_0.1.4 magick_2.7.3
## [19] S4Vectors_0.34.0 Matrix_1.4-1
## [21] rmarkdown_2.14 labeling_0.4.2
## [23] BiocParallel_1.30.0 stringr_1.4.0
## [25] RCurl_1.98-1.6 munsell_0.5.0
## [27] DelayedArray_0.22.0 compiler_4.2.0
## [29] xfun_0.30 pkgconfig_2.0.3
## [31] BiocGenerics_0.42.0 htmltools_0.5.2
## [33] tidyselect_1.1.2 SummarizedExperiment_1.26.0
## [35] tibble_3.1.6 GenomeInfoDbData_1.2.8
## [37] IRanges_2.30.0 matrixStats_0.62.0
## [39] viridisLite_0.4.0 fansi_1.0.3
## [41] aricode_1.0.0 crayon_1.5.1
## [43] withr_2.5.0 bitops_1.0-7
## [45] grid_4.2.0 jsonlite_1.8.0
## [47] gtable_0.3.0 lifecycle_1.0.1
## [49] DBI_1.1.2 magrittr_2.0.3
## [51] scales_1.2.0 cli_3.3.0
## [53] stringi_1.7.6 farver_2.1.0
## [55] XVector_0.36.0 bslib_0.3.1
## [57] ellipsis_0.3.2 generics_0.1.2
## [59] vctrs_0.4.1 tools_4.2.0
## [61] Biobase_2.56.0 glue_1.6.2
## [63] tweenr_1.0.2 hms_1.1.1
## [65] MatrixGenerics_1.8.0 parallel_4.2.0
## [67] fastmap_1.1.0 yaml_2.3.5
## [69] colorspace_2.0-3 GenomicRanges_1.48.0
## [71] sass_0.4.1