TileDBArray 1.4.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.7653589 1.7579309 2.0063569 . 0.6068169 1.1822191
## [2,] 0.4149316 0.4702942 1.3149890 . -0.6142518 -0.1684139
## [3,] -0.8928993 0.6440601 -1.6436185 . 0.7877078 0.8731913
## [4,] -1.3165886 -0.0232400 -0.1415271 . 0.1752986 1.0966519
## [5,] -0.6586645 -0.9696451 0.9401284 . 0.1309917 -2.0902261
## ... . . . . . .
## [96,] -0.954333573 1.115682180 -0.587400762 . -0.31519722 -0.01062201
## [97,] -0.338607739 -0.982182814 0.004453895 . -1.03895877 -1.43803296
## [98,] 0.054336440 0.547589202 -1.019146341 . -1.12133082 0.11411796
## [99,] 1.214075349 -1.034519117 -0.077370988 . -0.41330316 1.04075190
## [100,] -1.396782164 0.833165787 -1.135236735 . 0.54326519 0.75184684
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.7653589 1.7579309 2.0063569 . 0.6068169 1.1822191
## [2,] 0.4149316 0.4702942 1.3149890 . -0.6142518 -0.1684139
## [3,] -0.8928993 0.6440601 -1.6436185 . 0.7877078 0.8731913
## [4,] -1.3165886 -0.0232400 -0.1415271 . 0.1752986 1.0966519
## [5,] -0.6586645 -0.9696451 0.9401284 . 0.1309917 -2.0902261
## ... . . . . . .
## [96,] -0.954333573 1.115682180 -0.587400762 . -0.31519722 -0.01062201
## [97,] -0.338607739 -0.982182814 0.004453895 . -1.03895877 -1.43803296
## [98,] 0.054336440 0.547589202 -1.019146341 . -1.12133082 0.11411796
## [99,] 1.214075349 -1.034519117 -0.077370988 . -0.41330316 1.04075190
## [100,] -1.396782164 0.833165787 -1.135236735 . 0.54326519 0.75184684
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0.00 0.00
## [997,] 0 0 0 . 0.00 0.00
## [998,] 0 0 0 . 0.00 0.00
## [999,] 0 0 0 . 0.00 0.00
## [1000,] 0 0 0 . 0.00 -0.76
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.7653589 1.7579309 2.0063569 . 0.6068169 1.1822191
## GENE_2 0.4149316 0.4702942 1.3149890 . -0.6142518 -0.1684139
## GENE_3 -0.8928993 0.6440601 -1.6436185 . 0.7877078 0.8731913
## GENE_4 -1.3165886 -0.0232400 -0.1415271 . 0.1752986 1.0966519
## GENE_5 -0.6586645 -0.9696451 0.9401284 . 0.1309917 -2.0902261
## ... . . . . . .
## GENE_96 -0.954333573 1.115682180 -0.587400762 . -0.31519722 -0.01062201
## GENE_97 -0.338607739 -0.982182814 0.004453895 . -1.03895877 -1.43803296
## GENE_98 0.054336440 0.547589202 -1.019146341 . -1.12133082 0.11411796
## GENE_99 1.214075349 -1.034519117 -0.077370988 . -0.41330316 1.04075190
## GENE_100 -1.396782164 0.833165787 -1.135236735 . 0.54326519 0.75184684
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.7653589 0.4149316 -0.8928993 -1.3165886 -0.6586645 -0.9668578
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.7653589 1.7579309 2.0063569 0.1947978 1.2812140
## GENE_2 0.4149316 0.4702942 1.3149890 0.2397743 3.3972886
## GENE_3 -0.8928993 0.6440601 -1.6436185 -0.4971280 0.0655188
## GENE_4 -1.3165886 -0.0232400 -0.1415271 0.4741305 0.5602628
## GENE_5 -0.6586645 -0.9696451 0.9401284 -0.6493674 1.7201741
out * 2
## <100 x 10> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.53071775 3.51586178 4.01271379 . 1.2136339 2.3644382
## GENE_2 0.82986325 0.94058836 2.62997802 . -1.2285036 -0.3368279
## GENE_3 -1.78579857 1.28812018 -3.28723701 . 1.5754156 1.7463826
## GENE_4 -2.63317726 -0.04647999 -0.28305422 . 0.3505973 2.1933038
## GENE_5 -1.31732905 -1.93929023 1.88025682 . 0.2619834 -4.1804521
## ... . . . . . .
## GENE_96 -1.90866715 2.23136436 -1.17480152 . -0.63039445 -0.02124402
## GENE_97 -0.67721548 -1.96436563 0.00890779 . -2.07791755 -2.87606592
## GENE_98 0.10867288 1.09517840 -2.03829268 . -2.24266164 0.22823593
## GENE_99 2.42815070 -2.06903823 -0.15474198 . -0.82660632 2.08150379
## GENE_100 -2.79356433 1.66633157 -2.27047347 . 1.08653038 1.50369367
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## -15.598706 -13.841376 20.974284 -3.653391 8.766510 -4.255231 3.286161
## SAMP_8 SAMP_9 SAMP_10
## -11.192709 -2.652701 6.632021
out %*% runif(ncol(out))
## <100 x 1> matrix of class DelayedMatrix and type "double":
## y
## GENE_1 2.601174
## GENE_2 1.548655
## GENE_3 -3.750445
## GENE_4 2.090733
## GENE_5 -1.006676
## ... .
## GENE_96 -1.8089999
## GENE_97 -2.6243927
## GENE_98 1.2837895
## GENE_99 0.1944651
## GENE_100 -1.7799099
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.51643217 0.83735856 -0.37767921 . 0.1366204 -0.4487824
## [2,] 0.69062420 0.48620868 0.08777628 . 0.1482029 -0.4994888
## [3,] -0.26525881 0.01903778 -1.61559984 . 0.7767032 1.3065682
## [4,] 1.02538307 -0.93669319 -1.06465972 . 0.2107589 -0.9123735
## [5,] -0.85632297 -0.29087409 1.28078192 . 0.6279821 -1.8277384
## ... . . . . . .
## [96,] -0.1304329 -2.1392407 -1.0858814 . -0.29084924 1.08650679
## [97,] -0.1646354 -0.7453064 -0.1084963 . 1.51436871 1.27659059
## [98,] -1.0871612 -1.9741283 -0.6984505 . 0.46200746 -0.51272973
## [99,] 0.1487725 -1.6875562 0.6050079 . -0.23764495 1.51655635
## [100,] -0.9857514 -0.1419496 -1.6898642 . 0.08199729 -0.83399140
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.51643217 0.83735856 -0.37767921 . 0.1366204 -0.4487824
## [2,] 0.69062420 0.48620868 0.08777628 . 0.1482029 -0.4994888
## [3,] -0.26525881 0.01903778 -1.61559984 . 0.7767032 1.3065682
## [4,] 1.02538307 -0.93669319 -1.06465972 . 0.2107589 -0.9123735
## [5,] -0.85632297 -0.29087409 1.28078192 . 0.6279821 -1.8277384
## ... . . . . . .
## [96,] -0.1304329 -2.1392407 -1.0858814 . -0.29084924 1.08650679
## [97,] -0.1646354 -0.7453064 -0.1084963 . 1.51436871 1.27659059
## [98,] -1.0871612 -1.9741283 -0.6984505 . 0.46200746 -0.51272973
## [99,] 0.1487725 -1.6875562 0.6050079 . -0.23764495 1.51655635
## [100,] -0.9857514 -0.1419496 -1.6898642 . 0.08199729 -0.83399140
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] TileDBArray_1.4.0 DelayedArray_0.20.0 IRanges_2.28.0
## [4] S4Vectors_0.32.0 MatrixGenerics_1.6.0 matrixStats_0.61.0
## [7] BiocGenerics_0.40.0 Matrix_1.3-4 BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.7 bslib_0.3.1 compiler_4.1.1
## [4] BiocManager_1.30.16 jquerylib_0.1.4 tools_4.1.1
## [7] digest_0.6.28 bit_4.0.4 jsonlite_1.7.2
## [10] evaluate_0.14 lattice_0.20-45 nanotime_0.3.3
## [13] rlang_0.4.12 RcppCCTZ_0.2.9 yaml_2.2.1
## [16] xfun_0.27 fastmap_1.1.0 stringr_1.4.0
## [19] knitr_1.36 sass_0.4.0 bit64_4.0.5
## [22] grid_4.1.1 R6_2.5.1 rmarkdown_2.11
## [25] bookdown_0.24 tiledb_0.9.7 magrittr_2.0.1
## [28] htmltools_0.5.2 stringi_1.7.5 zoo_1.8-9