epiregulon.extra 1.3.0
epiregulon.extra is a companion package to epiregulon and provides functions to visualize transcription factor activity and network. It also provides statistical tests to identify transcription factors with differential activity and network topology. This tutorial continues from the reprogram-seq example included in epiregulon. We will use the gene regulatory networks constructed by epiregulon and continue with visualization and network analysis.
For full documentation of the epiregulon
package, please refer to the epiregulon book.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("epiregulon.extra")
To continue with the visualization functions, we will first need the gene expression matrix. We can download the data from scMultiome.
# load the MAE object
library(scMultiome)
## Loading required package: AnnotationHub
## Loading required package: BiocGenerics
## Loading required package: generics
##
## Attaching package: 'generics'
## The following objects are masked from 'package:base':
##
## as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
## setequal, union
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:generics':
##
## intersect, setdiff, setequal, union
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
## lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, rank, rbind, rownames, sapply, saveRDS, setdiff,
## setequal, table, tapply, union, unique, unsplit, which.max,
## which.min
## Loading required package: BiocFileCache
## Loading required package: dbplyr
## Loading required package: ExperimentHub
## Loading required package: MultiAssayExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
## The following object is masked from 'package:ExperimentHub':
##
## cache
## The following object is masked from 'package:AnnotationHub':
##
## cache
## Loading required package: SingleCellExperiment
mae <- scMultiome::reprogramSeq()
## see ?scMultiome and browseVignettes('scMultiome') for documentation
## loading from cache
# expression matrix
GeneExpressionMatrix <- mae[["GeneExpressionMatrix"]]
rownames(GeneExpressionMatrix) <- rowData(GeneExpressionMatrix)$name
reducedDim(GeneExpressionMatrix, "UMAP_Combined") <-
reducedDim(mae[["TileMatrix500"]], "UMAP_Combined")
# arrange hash_assignment
GeneExpressionMatrix$hash_assignment <- factor(as.character(
GeneExpressionMatrix$hash_assignment),
levels = c("HTO10_GATA6_UTR", "HTO2_GATA6_v2", "HTO8_NKX2.1_UTR", "HTO3_NKX2.1_v2",
"HTO1_FOXA2_v2", "HTO4_mFOXA1_v2", "HTO6_hFOXA1_UTR", "HTO5_NeonG_v2"
)
)
Next we load the regulon object previously calculated by epiregulon. Here we just use the pruned version of regulon object in which only relevant columns are kept. Using epiregulon, we can calculate activity of transcription factors included in the regulon object.
library(epiregulon)
library(epiregulon.extra)
data(regulon)
score.combine <- calculateActivity(
expMatrix = GeneExpressionMatrix,
regulon = regulon,
mode = "weight",
method = "weightedMean",
exp_assay = "normalizedCounts",
normalize = FALSE
)
## calculating TF activity from regulon using weightedmean
## Warning in calculateActivity(expMatrix = GeneExpressionMatrix, regulon =
## regulon, : The weight column contains multiple subcolumns but no cluster
## information was provided. Using first column to compute activity...
## aggregating regulons...
## creating weight matrix...
## calculating activity scores...
## normalize by the number of targets...
We perform a differential analysis of transcription factor activity across groups of cells. This function is a wrapper around findMarkers
from scran.
library(epiregulon.extra)
markers <- findDifferentialActivity(
activity_matrix = score.combine,
clusters = GeneExpressionMatrix$hash_assignment,
pval.type = "some",
direction = "up",
test.type = "t",
logvalue = FALSE
)
We can specify the top transcription factors of interest
markers.sig <- getSigGenes(markers, topgenes = 5)
## Using a logFC cutoff of 1 for class HTO10_GATA6_UTR for direction equal to any
## Using a logFC cutoff of 1 for class HTO2_GATA6_v2 for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO8_NKX2.1_UTR for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO3_NKX2.1_v2 for direction equal to any
## Using a logFC cutoff of 0.3 for class HTO1_FOXA2_v2 for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO4_mFOXA1_v2 for direction equal to any
## Using a logFC cutoff of 0.4 for class HTO6_hFOXA1_UTR for direction equal to any
## Using a logFC cutoff of 0.3 for class HTO5_NeonG_v2 for direction equal to any
We visualize the known differential transcription factors by bubble plot
plotBubble(
activity_matrix = score.combine,
tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2"),
clusters = GeneExpressionMatrix$hash_assignment
)
Then visualize the most differential transcription factors by clusters
plotBubble(
activity_matrix = score.combine,
tf = markers.sig$tf,
clusters = GeneExpressionMatrix$hash_assignment
)
Visualize the known differential transcription factors by violin plot.
plotActivityViolin(
activity_matrix = score.combine,
tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"),
clusters = GeneExpressionMatrix$hash_assignment
)
Visualize the known differential transcription factors by UMAP
options(ggrastr.default.dpi=100)
ActivityPlot <- plotActivityDim(
sce = GeneExpressionMatrix,
activity_matrix = score.combine,
tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"),
dimtype = "UMAP_Combined",
label = "Clusters",
point_size = 0.3,
ncol = 3,
rasterise = TRUE
)
ActivityPlot
In contrast, the gene expression of the transcription factors is very sparse
options(ggrastr.default.dpi=100)
ActivityPlot <- plotActivityDim(
sce = GeneExpressionMatrix,
activity_matrix = counts(GeneExpressionMatrix),
tf = c("NKX2-1", "GATA6", "FOXA1", "FOXA2", "AR"),
dimtype = "UMAP_Combined",
label = "Clusters",
point_size = 0.3,
ncol = 3,
limit = c(0, 2),
colors = c("grey", "blue"),
legend.label = "GEX",
rasterise = TRUE
)
ActivityPlot
Visualize the activity of the selected transcription factors by heat map. Due to the maximum size limit for this vignette, the output is not shown here.
plotHeatmapRegulon(
sce = GeneExpressionMatrix,
tfs = c("GATA6", "NKX2-1"),
regulon = regulon,
regulon_cutoff = 0,
downsample = 1000,
cell_attributes = "hash_assignment",
col_gap = "hash_assignment",
exprs_values = "normalizedCounts",
name = "regulon heatmap",
column_title_rot = 45,
raster_quality=4
)
plotHeatmapActivity(
activity = score.combine,
sce = GeneExpressionMatrix,
tfs = c("GATA6", "NKX2-1", "FOXA1", "FOXA2"),
downsample = 5000,
cell_attributes = "hash_assignment",
col_gap = "hash_assignment",
name = "Activity",
column_title_rot = 45,
raster_quality=3
)
Sometimes we are interested to know what pathways are enriched in the regulon of a particular TF. We can perform geneset enrichment using the enricher function from clusterProfiler.
# retrieve genesets
H <- EnrichmentBrowser::getGenesets(
org = "hsa",
db = "msigdb",
cat = "H",
gene.id.type = "SYMBOL"
)
C2 <- EnrichmentBrowser::getGenesets(
org = "hsa",
db = "msigdb",
cat = "C2",
gene.id.type = "SYMBOL"
)
C6 <- EnrichmentBrowser::getGenesets(
org = "hsa",
db = "msigdb",
cat = "C6",
gene.id.type = "SYMBOL"
)
# combine genesets and convert genesets to be compatible with enricher
gs <- c(H, C2, C6)
gs.list <- do.call(rbind, lapply(names(gs), function(x) {
data.frame(gs = x, genes = gs[[x]])
}))
enrichresults <- regulonEnrich(
TF = c("GATA6", "NKX2-1"),
regulon = regulon,
weight = "weight",
weight_cutoff = 0.1,
genesets = gs.list
)
## GATA6
##
## NKX2-1
# plot results
enrichPlot(results = enrichresults)
We can visualize the genesets as a network
plotGseaNetwork(
tf = names(enrichresults),
enrichresults = enrichresults,
p.adj_cutoff = 0.1,
ntop_pathways = 10
)
We are interested in understanding the differential networks between two conditions and determining which transcription factors account for the differences in the topology of networks. The pruned regulons with cluster-specific test statistics computed by pruneRegulon
can be used to generate cluster-specific networks based on user-defined cutoffs and to visualize differential networks for transcription factors of interest. In this dataset, the GATA6 gene was only expressed in cluster 1 (C1) and NKX2-1 was only expressed in cluster 3 (C3). If we visualize the target genes of GATA6, we can see that C1 has many more target genes of GATA6 compared to C5, a cluster that does not express GATA6. Similarly, NKX2-1 target genes are confined to C3 which is the only cluster that exogenously expresses NKX2-1.
plotDiffNetwork(regulon,
cutoff = 0.1,
tf = c("GATA6"),
weight = "weight",
clusters = c("C1", "C5"),
layout = "stress"
)
## Building graph using weight as edge weights
plotDiffNetwork(regulon,
cutoff = 0.1,
tf = c("NKX2-1"),
weight = "weight",
clusters = c("C3", "C5"),
layout = "stress"
)
## Building graph using weight as edge weights
We can also visualize how transcription factors relate to other transcription factors in C1 cluster.
selected <- which(regulon$weight[, "C1"] > 0 &
regulon$tf %in% c("GATA6", "FOXA1", "AR"))
C1_network <- buildGraph(regulon[selected, ],
weights = "weight",
cluster = "C1"
)
## Building graph using weight as edge weights
plotEpiregulonNetwork(C1_network,
layout = "sugiyama",
tfs_to_highlight = c("GATA6", "FOXA1", "AR")) +
ggplot2::ggtitle("C1")
To systematically examine the differential network topology between two clusters, we perform an edge subtraction between two graphs, using weights computed by pruneRegulon
. We then calculate the degree centrality of the weighted differential graphs and if desired, normalize the differential centrality against the total number of edges. The default normalization function is sqrt
as it preserves both the difference in the number of edges (but scaled by sqrt) and the differences in the weights. If the user only wants to examine the differences in the averaged weights, the FUN
argument can be changed to identity
. Finally, we rank the transcription factors by (normalized) differential centrality. For demonstration purpose, the regulon list is truncated but the full list would contain all the transcription factors.
# rank by differential centrality
C1_network <- buildGraph(regulon, weights = "weight", cluster = "C1")
## Building graph using weight as edge weights
C5_network <- buildGraph(regulon, weights = "weight", cluster = "C5")
## Building graph using weight as edge weights
diff_graph <- buildDiffGraph(C1_network, C5_network, abs_diff = FALSE)
diff_graph <- addCentrality(diff_graph)
diff_graph <- normalizeCentrality(diff_graph)
rank_table <- rankTfs(diff_graph)
library(ggplot2)
ggplot(rank_table, aes(x = rank, y = centrality)) +
geom_point() +
ggrepel::geom_text_repel(data = rank_table, aes(label = tf)) +
theme_classic()
sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggplot2_3.5.1 org.Hs.eg.db_3.20.0
## [3] AnnotationDbi_1.69.0 msigdbr_7.5.1
## [5] epiregulon.extra_1.3.0 epiregulon_1.3.0
## [7] scMultiome_1.7.0 SingleCellExperiment_1.29.0
## [9] MultiAssayExperiment_1.33.0 SummarizedExperiment_1.37.0
## [11] Biobase_2.67.0 GenomicRanges_1.59.0
## [13] GenomeInfoDb_1.43.0 IRanges_2.41.0
## [15] S4Vectors_0.45.0 MatrixGenerics_1.19.0
## [17] matrixStats_1.4.1 ExperimentHub_2.15.0
## [19] AnnotationHub_3.15.0 BiocFileCache_2.15.0
## [21] dbplyr_2.5.0 BiocGenerics_0.53.1
## [23] generics_0.1.3 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] splines_4.5.0 ggplotify_0.1.2 bitops_1.0-9
## [4] filelock_1.0.3 R.oo_1.27.0 tibble_3.2.1
## [7] polyclip_1.10-7 graph_1.85.0 XML_3.99-0.17
## [10] lifecycle_1.0.4 edgeR_4.5.0 lattice_0.22-6
## [13] MASS_7.3-61 backports_1.5.0 magrittr_2.0.3
## [16] limma_3.63.1 sass_0.4.9 rmarkdown_2.29
## [19] jquerylib_0.1.4 yaml_2.3.10 ggtangle_0.0.4
## [22] metapod_1.15.0 RColorBrewer_1.1-3 cowplot_1.1.3
## [25] DBI_1.2.3 abind_1.4-8 zlibbioc_1.53.0
## [28] purrr_1.0.2 R.utils_2.12.3 ggraph_2.2.1
## [31] RCurl_1.98-1.16 yulab.utils_0.1.7 tweenr_2.0.3
## [34] rappdirs_0.3.3 GenomeInfoDbData_1.2.13 enrichplot_1.27.1
## [37] ggrepel_0.9.6 irlba_2.3.5.1 tidytree_0.4.6
## [40] annotate_1.85.0 dqrng_0.4.1 codetools_0.2-20
## [43] DelayedArray_0.33.1 DOSE_4.1.0 scuttle_1.17.0
## [46] ggforce_0.4.2 tidyselect_1.2.1 aplot_0.2.3
## [49] UCSC.utils_1.3.0 farver_2.1.2 ScaledMatrix_1.15.0
## [52] viridis_0.6.5 jsonlite_1.8.9 BiocNeighbors_2.1.0
## [55] tidygraph_1.3.1 scater_1.35.0 tools_4.5.0
## [58] treeio_1.31.0 Rcpp_1.0.13-1 glue_1.8.0
## [61] gridExtra_2.3 SparseArray_1.7.1 xfun_0.49
## [64] qvalue_2.39.0 dplyr_1.1.4 HDF5Array_1.35.1
## [67] withr_3.0.2 BiocManager_1.30.25 fastmap_1.2.0
## [70] rhdf5filters_1.19.0 bluster_1.17.0 fansi_1.0.6
## [73] digest_0.6.37 rsvd_1.0.5 gridGraphics_0.5-1
## [76] R6_2.5.1 mime_0.12 colorspace_2.1-1
## [79] GO.db_3.20.0 Cairo_1.6-2 RSQLite_2.3.7
## [82] R.methodsS3_1.8.2 utf8_1.2.4 tidyr_1.3.1
## [85] data.table_1.16.2 graphlayouts_1.2.0 httr_1.4.7
## [88] S4Arrays_1.7.1 pkgconfig_2.0.3 gtable_0.3.6
## [91] blob_1.2.4 XVector_0.47.0 clusterProfiler_4.15.0
## [94] htmltools_0.5.8.1 fgsea_1.33.0 bookdown_0.41
## [97] GSEABase_1.69.0 scales_1.3.0 png_0.1-8
## [100] ggfun_0.1.7 scran_1.35.0 knitr_1.48
## [103] reshape2_1.4.4 nlme_3.1-166 checkmate_2.3.2
## [106] curl_6.0.0 cachem_1.1.0 rhdf5_2.51.0
## [109] stringr_1.5.1 BiocVersion_3.21.1 parallel_4.5.0
## [112] vipor_0.4.7 ggrastr_1.0.2 pillar_1.9.0
## [115] grid_4.5.0 vctrs_0.6.5 BiocSingular_1.23.0
## [118] beachmat_2.23.0 xtable_1.8-4 cluster_2.1.6
## [121] beeswarm_0.4.0 Rgraphviz_2.51.0 evaluate_1.0.1
## [124] KEGGgraph_1.67.0 tinytex_0.54 magick_2.8.5
## [127] cli_3.6.3 locfit_1.5-9.10 compiler_4.5.0
## [130] rlang_1.1.4 crayon_1.5.3 labeling_0.4.3
## [133] fs_1.6.5 plyr_1.8.9 ggbeeswarm_0.7.2
## [136] stringi_1.8.4 viridisLite_0.4.2 BiocParallel_1.41.0
## [139] babelgene_22.9 munsell_0.5.1 Biostrings_2.75.0
## [142] lazyeval_0.2.2 GOSemSim_2.33.0 Matrix_1.7-1
## [145] patchwork_1.3.0 bit64_4.5.2 Rhdf5lib_1.29.0
## [148] KEGGREST_1.47.0 statmod_1.5.0 highr_0.11
## [151] igraph_2.1.1 memoise_2.0.1 bslib_0.8.0
## [154] ggtree_3.15.0 fastmatch_1.1-4 bit_4.5.0
## [157] EnrichmentBrowser_2.37.0 gson_0.1.0 ape_5.8