beachmat 2.4.0
Ordinarily, direct support for a matrix representation would require the appropriate methods to be defined in beachmat at compile time. This is the case for the most widely used matrix classes but is somewhat restrictive for other community-contributed matrix representations. Fortunately, R provides a mechanism to link across shared libraries from different packages. This means that package developers who define a R matrix representation can also define C++ methods for native read/write support in beachmat-dependent code. By doing so, we can improve efficiency of access to these new classes by avoiding the need for block processing via R.
A functioning demonstration of this approach is available in the extensions
test package.
This vignette will provide an explanation of the code in extensions
, and we suggest examining the source code at the same time:
system.file("extensions", package="beachmat")
## [1] "/tmp/Rtmpy5yHU3/Rinst362f6e6c84c2/beachmat/extensions"
Assume that we have already defined a new matrix-like S4 class (here, AaronMatrix
).
To notify the beachmat API that direct input support is available, we need to:
supportCppAccess()
generic (from beachmat) for this class.
This should return TRUE
if direct support is available (obviously).type()
generic from the DelayedArray package.
This should return the type of the matrix, i.e., integer, logical, numeric or character.It is possible to only have direct support for particular data types of the given matrix representation.
The example in extensions
only directly supports integer and character AaronMatrix
objects1 Because I was too lazy to add all of them. and will only return TRUE
for such types.
We will use integer matrices for demonstration, though it is simple to generalize this to all types by replacing _integer
with, e.g., _character
2 Some understanding of C++ templates will greatly simplify the definition of the same methods for different types..
First, we define a create()
function that takes a SEXP
object and returns a void
pointer.
This should presumably point to some C++ class that can contain intermediate data structures for efficient access.
void * ptr = AaronMatrix_integer_input_create(in /* SEXP */);
We define a clone()
function that performs a deep copy of the aforementioned pointer.
void * ptr_copy = AaronMatrix_integer_input_clone(ptr /* void* */);
We define a destroy()
function that frees the memory pointed to by ptr
.
AaronMatrix_integer_input_destroy(ptr /* void* */);
We define a get_dim()
function that records the number of rows and columns in the object pointed to by ptr
.
Note the pointers for nrow
and ncol
.
AaronMatrix_integer_input_dim(
ptr, /* void* */
nrow, /* size_t* */
ncol /* size_t* */
);
A systematic naming scheme is used for all functions, consisting of:
AaronMatrix
.integer
.input
or output
class.destroy
.In general, the getter functions follow the same structure as that described for the input API.
We expect a get
function to obtain a specified entry of the matrix:
AaronMatrix_integer_intput_get(
ptr, /* void* */
r, /* size_t */
c, /* size_t */
val /* int* */
);
Note that val
is a pointer to the matrix type.
For example, val
should be a Rcpp::String*
for character matrices, a double*
for numeric matrices, and an int*
for logical matrices.
Developers can assume that r
and c
are valid, i.e., within [0, nrow)
and [0, ncol)
respectively.
These checks are performed by beachmat and do not have to be repeated within developer-defined functions3 Obviously, the dimensions of the matrix pointed to by ptr
should not change!.
Here, we will use character matrices4 Character matrices tend to require some special attention, as character arrays need to be coerced to Rcpp::String
objects to be returned in in
. as an example.
We expect a getCol
function to obtain a column of the matrix:
AaronMatrix_character_input_getCol(
ptr, /* void* */
c, /* size_t */
in, /* Rcpp::StringVector::iterator* */
first, /* size_t */
last /* size_t */
);
… and another getRow
function to obtain a row of the matrix:
AaronMatrix_character_input_getRow(
ptr, /* void* */
r, /* size_t */
in*, /* Rcpp::StringVector::iterator* */
first, /* size_t */
last /* size_t */
);
These are in camelcase to simplify parsing of the function names.
We further expect a getCols
function to obtain multiple columns:
AaronMatrix_character_input_getCols(
ptr, /* void* */
c, /* size_t */
indices, /* Rcpp::IntegerVector::iterator* */
n, /* size_t */
in, /* Rcpp::StringVector::iterator* */
first, /* size_t */
last /* size_t */
);
… and a getRows
function to obtain multiple rows:
AaronMatrix_character_input_getRows(
ptr, /* void* */
r, /* size_t */
indices, /* Rcpp::IntegerVector::iterator* */
n, /* size_t */
in, /* Rcpp::StringVector::iterator* */
first, /* size_t */
last /* size_t */
);
In all cases, first
and last
can be assumed to be valid, i.e., first <= last
and both in [0, nrow)
or [0, ncol)
(for column and row access, respectively).
Indices in indices
can also be assumed to be valid, i.e., within matrix dimensions and strictly increasing.
We stress that the various iterator arguments are pointers to iterators rather than the iterators themselves.
This is to avoid potential issues with C++ classes when using C-style linkage via R’s R_GetCCallable()
framework.
For integer, logical or numeric matrices, we need to account for type conversions. This is done by defining the following functions (using integer matrices as an example):
AaronMatrix_integer_input_getCol_integer
, for getting a single column’s values as integers.AaronMatrix_integer_input_getCol_numeric
, for getting a single column’s values as double-precision values.AaronMatrix_integer_input_getRow_integer
, for getting a single row’s values as integers.AaronMatrix_integer_input_getRow_numeric
, for getting a single row’s values as double-precision values.AaronMatrix_integer_input_getCols_integer
, for getting multiple columns’ values as integers.AaronMatrix_integer_input_getCols_numeric
, for getting multiple columns’ values as double-precision values.AaronMatrix_integer_input_getRows_integer
, for getting multiple rows’ values as integers.AaronMatrix_integer_input_getRows_numeric
, for getting multiple rows’ values as double-precision values.Taking the single-column getter as an example:
AaronMatrix_integer_input_getCol_integer(
ptr, /* void* */
c, /* size_t */
in, /* Rcpp::IntegerVector::iterator* */
first, /* size_t */
last /* size_t */
);
AaronMatrix_integer_input_getCol_numeric(
ptr, /* void* */
c, /* size_t */
in, /* Rcpp::NumericVector::iterator* */
first, /* size_t */
last /* size_t */
);
The function name now has an additional suffix to denote the destination type.
We explicitly define conversions here as the cross-library linking framework does not support templating or overloading of in
.
To notify the beachmat API that direct output support is available, we need to define flags in our package’s namespace.
beachmat_AaronMatrix_integer_output <- TRUE
beachmat_AaronMatrix_character_output <- TRUE
This indicates that support is available for AaronMatrix
integer and character outputs.
Missing or FALSE
flags indicate that no support is available, in which case beachmat will write to an ordinary matrix by default.
Again, we will use integer matrices for demonstration.
The required functions are mostly similar to the input case.
For creation, we expect to have the number of rows nr
and columns nc
:
void * ptr = AaronMatrix_integer_output_create(
nr /* size_t */,
nc /* size_t */
);
We define a clone()
function to perform a deep copy:
void * ptr_copy = AaronMatrix_integer_output_clone(ptr /* void* */);
We also define a destroy()
function to free memory:
AaronMatrix_integer_output_destroy(ptr /* void* */);
In all cases, we use _output_
to indicate that we are dealing with an output matrix class.
In general, the setter functions follow the same structure as that described for the out API.
We expect a set
function to obtain a specified entry of the matrix:
AaronMatrix_integer_intput_set(
ptr, /* void* */
r, /* size_t */
c, /* size_t */
val /* int* */
);
Again, note that val
is a pointer to the matrix type.
Here, we will use character matrices5 Character matrices tend to require some special attention, as character arrays need to be coerced to Rcpp::String
objects to be returned in in
. as an example.
We expect a setCol
function to obtain a column of the matrix:
AaronMatrix_character_output_setCol(
ptr, /* void* */
c, /* size_t */
in, /* Rcpp::StringVector::iterator* */
first, /* size_t */
last /* size_t */
);
… and another setRow
function to obtain a row of the matrix:
AaronMatrix_character_output_setRow(
ptr, /* void* */
r, /* size_t */
in*, /* Rcpp::StringVector::iterator* */
first, /* size_t */
last /* size_t */
);
These are in camelcase to simplify parsing of the function names.
We further expect a setColIndexed
function to set specific elements of a column:
AaronMatrix_character_output_setColIndexed(
ptr, /* void */
c, /* size_t */
n, /* size_t */
idx, /* Rcpp::IntegerVector::iterator */
in /* Rcpp::StringVector::iterator */
)
… where idx
points to an array of n
zero-indexed row indices and val
points to an array of values.
The function should assign each value to the corresponding row at column c
of the output matrix.
Similarly, we expect a setRowIndexed
function to set specific elements of a row:
AaronMatrix_character_output_setRowIndexed(
ptr, /* void */
r, /* size_t */
n, /* size_t */
idx, /* Rcpp::IntegerVector::iterator */
in /* Rcpp::StringVector::iterator */
)
… where idx
now contains column indices.
For integer, logical or numeric matrices, we need to account for type conversions. This is done by defining the following functions (using integer matrices as an example):
AaronMatrix_integer_output_setCol_integer
, for setting a single column’s values from integers.AaronMatrix_integer_output_setCol_numeric
, for setting a single column’s values from double-precision values.AaronMatrix_integer_output_setRow_integer
, for setting a single row’s values from integers.AaronMatrix_integer_output_setRow_numeric
, for setting a single row’s values from double-precision values.AaronMatrix_integer_output_setColIndexed_integer
, for indexed setting of a single column’s values from integers.AaronMatrix_integer_output_setColIndexed_numeric
, for indexed setting of a single column’s values from double-precision values.AaronMatrix_integer_output_setRowIndexed_integer
, for indexed setting of a single row’s values from integers.AaronMatrix_integer_output_setRowIndexed_numeric
, for indexed setting of a single row’s values from double-precision valuesTaking the single-column setter as an example:
AaronMatrix_integer_output_setCol_integer(
ptr, /* void* */
c, /* size_t */
in, /* Rcpp::IntegerVector::iterator* */
first, /* size_t */
last /* size_t */
);
AaronMatrix_integer_output_setCol_numeric(
ptr, /* void* */
c, /* size_t */
in, /* Rcpp::NumericVector::iterator* */
first, /* size_t */
last /* size_t */
);
All single-element and single-row/column getters should be supported:
AaronMatrix_character_output_get
AaronMatrix_character_output_getRow
AaronMatrix_character_output_getCol
For numeric types, convertible getters should also be supported:
AaronMatrix_character_output_get
AaronMatrix_character_output_getRow_integer
AaronMatrix_character_output_getRow_numeric
AaronMatrix_character_output_getCol_integer
AaronMatrix_character_output_getCol_numeric
We use the R_RegisterCCallable()
function from the R API to register the above functions (see here for an explanation).
This ensures that they can be found by beachmat when an AaronMatrix
instance is encountered.
Note that the functions must be defined with C-style linkage in order for this procedure to work properly, hence the use of extern "C"
in the extensions
test package.
Needless to say, the NAMESPACE
should contain an appropriate useDynLib
command.
This means that shared library will be loaded along with the package, allowing beachmat to access the registered routines within.
However, the supportCppAccess
method and output flags do not need to be exported, as these will be directly recovered from the package’s namespace.
We suggest using the beachtest package to test correct input and output via external linkage to a custom matrix representation.
When using the testthat framework, this can be added to setup.R
:
testpkg <- system.file("testpkg", package="beachmat")
devtools::install(testpkg, quick=TRUE)
library(beachtest)
It is simple to write test scripts using functions like check_read_all
and check_write_all
to quickly verify that linkage works correctly.
Developers are again referred to the extensions
test package for a working example.