TileDBArray 1.17.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.055556962 -0.395138152 1.386441027 . -0.58701808 0.70950022
## [2,] -0.276908528 -0.101739958 -1.515554487 . -0.47012888 0.02823567
## [3,] -1.172480217 -0.347823769 -1.613379697 . 0.53188483 1.26610341
## [4,] -0.746995739 1.529183269 -0.702156178 . 0.42175634 1.82113134
## [5,] -0.614966331 -0.315424856 -0.001586375 . -0.42679817 0.82959687
## ... . . . . . .
## [96,] 0.4339236 0.4527595 -1.3313065 . -2.56514418 -0.13963009
## [97,] 0.1009069 0.1681388 -0.6880096 . 0.68743163 -0.13835786
## [98,] -0.5598460 -0.9072928 -0.7273098 . 0.05775738 0.45138891
## [99,] 0.4103314 -0.4301997 -0.7680463 . 0.79908095 -0.08070133
## [100,] -0.4523712 1.2167622 -0.7517439 . 0.77198020 -0.24428633
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.055556962 -0.395138152 1.386441027 . -0.58701808 0.70950022
## [2,] -0.276908528 -0.101739958 -1.515554487 . -0.47012888 0.02823567
## [3,] -1.172480217 -0.347823769 -1.613379697 . 0.53188483 1.26610341
## [4,] -0.746995739 1.529183269 -0.702156178 . 0.42175634 1.82113134
## [5,] -0.614966331 -0.315424856 -0.001586375 . -0.42679817 0.82959687
## ... . . . . . .
## [96,] 0.4339236 0.4527595 -1.3313065 . -2.56514418 -0.13963009
## [97,] 0.1009069 0.1681388 -0.6880096 . 0.68743163 -0.13835786
## [98,] -0.5598460 -0.9072928 -0.7273098 . 0.05775738 0.45138891
## [99,] 0.4103314 -0.4301997 -0.7680463 . 0.79908095 -0.08070133
## [100,] -0.4523712 1.2167622 -0.7517439 . 0.77198020 -0.24428633
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.055556962 -0.395138152 1.386441027 . -0.58701808 0.70950022
## GENE_2 -0.276908528 -0.101739958 -1.515554487 . -0.47012888 0.02823567
## GENE_3 -1.172480217 -0.347823769 -1.613379697 . 0.53188483 1.26610341
## GENE_4 -0.746995739 1.529183269 -0.702156178 . 0.42175634 1.82113134
## GENE_5 -0.614966331 -0.315424856 -0.001586375 . -0.42679817 0.82959687
## ... . . . . . .
## GENE_96 0.4339236 0.4527595 -1.3313065 . -2.56514418 -0.13963009
## GENE_97 0.1009069 0.1681388 -0.6880096 . 0.68743163 -0.13835786
## GENE_98 -0.5598460 -0.9072928 -0.7273098 . 0.05775738 0.45138891
## GENE_99 0.4103314 -0.4301997 -0.7680463 . 0.79908095 -0.08070133
## GENE_100 -0.4523712 1.2167622 -0.7517439 . 0.77198020 -0.24428633
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.05555696 -0.27690853 -1.17248022 -0.74699574 -0.61496633 0.97254962
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.055556962 -0.395138152 1.386441027 0.584417783 -0.066494920
## GENE_2 -0.276908528 -0.101739958 -1.515554487 0.332995329 1.083063235
## GENE_3 -1.172480217 -0.347823769 -1.613379697 -2.622962737 -0.679511201
## GENE_4 -0.746995739 1.529183269 -0.702156178 -0.484336508 -1.770037278
## GENE_5 -0.614966331 -0.315424856 -0.001586375 0.277902437 -0.635594714
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.11111392 -0.79027630 2.77288205 . -1.17403617 1.41900044
## GENE_2 -0.55381706 -0.20347992 -3.03110897 . -0.94025776 0.05647134
## GENE_3 -2.34496043 -0.69564754 -3.22675939 . 1.06376966 2.53220682
## GENE_4 -1.49399148 3.05836654 -1.40431236 . 0.84351267 3.64226268
## GENE_5 -1.22993266 -0.63084971 -0.00317275 . -0.85359634 1.65919373
## ... . . . . . .
## GENE_96 0.8678472 0.9055190 -2.6626130 . -5.1302884 -0.2792602
## GENE_97 0.2018138 0.3362775 -1.3760191 . 1.3748633 -0.2767157
## GENE_98 -1.1196921 -1.8145856 -1.4546197 . 0.1155148 0.9027778
## GENE_99 0.8206629 -0.8603994 -1.5360925 . 1.5981619 -0.1614027
## GENE_100 -0.9047423 2.4335244 -1.5034877 . 1.5439604 -0.4885727
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## 4.1880611 -4.3962466 -7.4409218 8.0829965 0.5866394 2.2547156
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## -2.1305914 -10.5604657 0.1918863 -13.9827984
out %*% runif(ncol(out))
## [,1]
## GENE_1 0.171380673
## GENE_2 -1.878274810
## GENE_3 -0.966916997
## GENE_4 -2.009514598
## GENE_5 -0.448525757
## GENE_6 0.719650752
## GENE_7 0.002111928
## GENE_8 0.653235820
## GENE_9 1.884652687
## GENE_10 2.576594927
## GENE_11 1.729172909
## GENE_12 0.576704031
## GENE_13 -1.974703051
## GENE_14 0.564135941
## GENE_15 0.943694722
## GENE_16 1.637595877
## GENE_17 -2.085152670
## GENE_18 2.495773099
## GENE_19 0.654923643
## GENE_20 -2.144098283
## GENE_21 -0.799616215
## GENE_22 -0.630365863
## GENE_23 1.145197878
## GENE_24 -0.309028591
## GENE_25 0.065668558
## GENE_26 0.712174896
## GENE_27 2.942849044
## GENE_28 -0.322037311
## GENE_29 0.198897812
## GENE_30 -0.484786337
## GENE_31 -1.071517591
## GENE_32 1.959180726
## GENE_33 0.979816586
## GENE_34 -1.471507924
## GENE_35 -1.811756994
## GENE_36 0.977330364
## GENE_37 -1.332356658
## GENE_38 -0.692954077
## GENE_39 0.415385852
## GENE_40 -1.682124467
## GENE_41 1.457351187
## GENE_42 -0.713191762
## GENE_43 -0.019563827
## GENE_44 -0.951355718
## GENE_45 0.412975738
## GENE_46 -1.323072305
## GENE_47 -1.265743291
## GENE_48 -1.036534601
## GENE_49 -0.123497655
## GENE_50 -0.652322983
## GENE_51 -4.278121986
## GENE_52 1.869916305
## GENE_53 -0.041193660
## GENE_54 2.691536844
## GENE_55 -0.076875584
## GENE_56 -2.318561975
## GENE_57 -0.820768634
## GENE_58 0.491450966
## GENE_59 2.838464456
## GENE_60 0.873014373
## GENE_61 -1.399342570
## GENE_62 0.071156634
## GENE_63 -2.617497581
## GENE_64 -0.495098262
## GENE_65 0.697793270
## GENE_66 -3.425436154
## GENE_67 -1.296361818
## GENE_68 0.410112179
## GENE_69 2.883929783
## GENE_70 0.195836212
## GENE_71 -1.539604128
## GENE_72 1.563360303
## GENE_73 -2.819334178
## GENE_74 2.721954365
## GENE_75 1.492292676
## GENE_76 -1.199641800
## GENE_77 -0.727461572
## GENE_78 0.898440099
## GENE_79 -1.003602685
## GENE_80 1.793237983
## GENE_81 -0.559108854
## GENE_82 0.892487704
## GENE_83 2.055190244
## GENE_84 1.714499242
## GENE_85 -0.796074971
## GENE_86 -2.387720434
## GENE_87 0.634524902
## GENE_88 -0.871479707
## GENE_89 -0.662888593
## GENE_90 1.062416809
## GENE_91 -0.449532237
## GENE_92 3.582563736
## GENE_93 0.752716319
## GENE_94 -0.644584909
## GENE_95 -0.881005313
## GENE_96 -3.172959729
## GENE_97 0.075185991
## GENE_98 -1.235460288
## GENE_99 -0.676345722
## GENE_100 -0.078578486
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.07639798 0.57236687 0.36849438 . 1.5086238 -0.1417398
## [2,] -1.13451996 0.15143393 0.24197403 . 1.0983668 -0.6374287
## [3,] 0.56958931 -0.21934726 -0.19875683 . 0.3957727 1.0701223
## [4,] 0.70928837 1.36972277 0.03965432 . 0.4150734 1.2693784
## [5,] -0.34365728 -0.97007911 -1.43584299 . -1.4289656 -2.3072979
## ... . . . . . .
## [96,] -1.6877552 -0.5855347 -0.3529514 . 0.2651914 0.4413382
## [97,] 1.6893366 1.1636993 -1.1950555 . 1.3787739 0.7966570
## [98,] 2.3331349 0.2973305 0.6927362 . -0.5477409 -0.4583042
## [99,] -1.4180469 -1.7192956 0.2608827 . -0.4027489 0.2150282
## [100,] -0.4680870 0.4785957 0.6972376 . 0.3010572 -1.2723648
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.07639798 0.57236687 0.36849438 . 1.5086238 -0.1417398
## [2,] -1.13451996 0.15143393 0.24197403 . 1.0983668 -0.6374287
## [3,] 0.56958931 -0.21934726 -0.19875683 . 0.3957727 1.0701223
## [4,] 0.70928837 1.36972277 0.03965432 . 0.4150734 1.2693784
## [5,] -0.34365728 -0.97007911 -1.43584299 . -1.4289656 -2.3072979
## ... . . . . . .
## [96,] -1.6877552 -0.5855347 -0.3529514 . 0.2651914 0.4413382
## [97,] 1.6893366 1.1636993 -1.1950555 . 1.3787739 0.7966570
## [98,] 2.3331349 0.2973305 0.6927362 . -0.5477409 -0.4583042
## [99,] -1.4180469 -1.7192956 0.2608827 . -0.4027489 0.2150282
## [100,] -0.4680870 0.4785957 0.6972376 . 0.3010572 -1.2723648
sessionInfo()
## R version 4.5.0 beta (2025-04-02 r88102)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.21 TileDBArray_1.17.0 DelayedArray_0.33.6
## [4] SparseArray_1.7.7 S4Arrays_1.7.3 IRanges_2.41.3
## [7] abind_1.4-8 S4Vectors_0.45.4 MatrixGenerics_1.19.1
## [10] matrixStats_1.5.0 BiocGenerics_0.53.6 generics_0.1.3
## [13] Matrix_1.7-3 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_2.0.0 compiler_4.5.0
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.47.2 tiledb_0.30.2
## [16] knitr_1.50 bookdown_0.42 bslib_0.9.0
## [19] rlang_1.1.5 cachem_1.1.0 xfun_0.52
## [22] sass_0.4.9 bit64_4.6.0-1 cli_3.6.4
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.0
## [28] lifecycle_1.0.4 data.table_1.17.0 evaluate_1.0.3
## [31] nanotime_0.3.12 zoo_1.8-14 rmarkdown_2.29
## [34] tools_4.5.0 htmltools_0.5.8.1