MultiDataSet
is a new class designed to manage different omic datasets with samples in common. The main purpose when developing MultiDataSet
was to ease the integration of multiple biological datasets.
MultiDataSet
is based in Bioconductor’s framework and it can work with S4 classes derived from eSet
or from SummarizedExperiment
. The following data is extracted from each set that is added to a MultiDataSet
:
It should be taken into account that phenotypic data is stored independently for each set. This allows storing variables with the same name but different values in different sets (e.g. technical variables). Another fact is that feature data is stored in the form of an AnnotatedDataFrame
and GenomicRanges
. This design allows to quickly perform subsets using GenomicRanges
while preserving the possibility of storing sets that do not have genomic coordinates (e.g. metabolic or exposure data).
In this document, addition of sets and subsetting will be presented. Advanced features such as creating new functions to add sets or developing integration functions using MultiDataSet
are covered in other documents.
In the code below, the libraries needed in this tutorial are loaded:
library(MultiDataSet)
library(MEALData)
library(minfiData)
library(GenomicRanges)
MultiDataSet
objects should be created prior to adding any object using the constructor:
multi <- createMultiDataSet()
multi
## Object of class 'MultiDataSet'
## . assayData: 0 elements
## . featureData:
## . rowRanges:
## . phenoData:
The function names
recovers the names of sets included in the MultiDataSet
. Right now, the object is empty so there are no names:
names(multi)
## NULL
length(names(multi))
## [1] 0
Sets can be added to MultiDataSet
using two classes of functions: general and specific. General functions add any eSet
or SummarizedExperiment
while specific functions add more specific objects (e.g. ExpressionSet
).
General functions directly interact with the MultiDataSet
and change its content. They only check if the incoming set is an eSet
or a SummarizedExperiment
. Due to their flexibility, they are thought to be used by developers to create specific functions.
MultiDataSet
contains two general functions: add_eset
and add_rse
. They work similarly but they are adapted to the particularities of eSet
and SummarizedExperiment
. Therefore, their common features will only be covered in the add eSet section.
add_eset
is the general function to add eSet-derived classes. This function has three important arguments. object
is the MultiDataSet
where the set will be added, set
is the eSet
that will be added and dataset.type
is the type of the new set. The next lines will illustrate its use by adding an ExpressionSet
from MEALData:
data(eset)
eset
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 21916 features, 61 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: GM00016 GM00022 ... WG3466 (61 total)
## varLabels: gender inv
## varMetadata: labelDescription
## featureData
## featureNames: ILMN_1343291 ILMN_1651209 ... ILMN_2415949 (21916
## total)
## fvarLabels: chr start end genes
## fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: GPL6883
multi2 <- add_eset(multi, eset, dataset.type = "expression")
## Warning in add_eset(multi, eset, dataset.type = "expression"): No id column
## found in pData. The id will be equal to the sampleNames
multi2
## Object of class 'MultiDataSet'
## . assayData: 1 elements
## . expression: 21916 features, 61 samples
## . featureData:
## . expression: 21916 rows, 4 cols (chr, ..., end)
## . rowRanges:
## . expression: YES
## . phenoData:
## . expression: 61 samples, 3 cols (gender, ..., inv)
multi
## Object of class 'MultiDataSet'
## . assayData: 0 elements
## . featureData:
## . rowRanges:
## . phenoData:
The print of multi2 shows the names of the sets that has been added and, for each set, the number of features and samples and if it has a rowRanges. It should be noticed that add_eset
does not modify the MultiDataSet
passed in the object
argument. Consequently, multi is still empty. This property is common of all the functions used to add sets to the MultiDataSet
.
By default, the name of the incoming set is equal to dataset.type. If we want to add another set of the same type, we can use the argument dataset.name
to differentiate them. As an example, we will add the same ExpressionSet
of the previous example but with another name:
multi2 <- add_eset(multi2, eset, dataset.type = "expression", dataset.name = "new")
## Warning in add_eset(multi2, eset, dataset.type = "expression", dataset.name
## = "new"): No id column found in pData. The id will be equal to the
## sampleNames
multi2
## Object of class 'MultiDataSet'
## . assayData: 2 elements
## . expression: 21916 features, 61 samples
## . expression+new: 21916 features, 61 samples
## . featureData:
## . expression: 21916 rows, 4 cols (chr, ..., end)
## . expression+new: 21916 rows, 4 cols (chr, ..., end)
## . rowRanges:
## . expression: YES
## . expression+new: YES
## . phenoData:
## . expression: 61 samples, 3 cols (gender, ..., inv)
## . expression+new: 61 samples, 3 cols (gender, ..., inv)
If dataset.name
is used, the resulting name is “dataset.type+dataset.name”. With this strategy, we can have different datasets of the same type and we can still retrieve those datasets corresponding to the same type of data.
In order to assure sample consistency across the difference datasets, we use a common sample identifier across all datasets. Sample identifier should be introduced by adding a column called id
in the phenotypic data (phenoData
) of the object. If it is not already present, it is created by default using the sample names of the given set.
Because our ExpressionSet
does not contain this column, a warning is raised. To solve it, we can manually add an id
to our dataset:
eset2 <- eset
eset2$id <- 1:61
multi2 <- add_eset(multi, eset2, dataset.type = "expression")
multi2
## Object of class 'MultiDataSet'
## . assayData: 1 elements
## . expression: 21916 features, 61 samples
## . featureData:
## . expression: 21916 rows, 4 cols (chr, ..., end)
## . rowRanges:
## . expression: YES
## . phenoData:
## . expression: 61 samples, 3 cols (gender, ..., inv)
There are three additional arguments: warnings
, overwrite
and GRanges
. warnings
can be used to enable or disable the warnings. overwrite
is used when we want to add a set with a name that is currently used. If TRUE, the set is substituted. If FALSE, nothing is changed:
eset2 <- eset[, 1:10]
multi2 <- add_eset(multi, eset, dataset.type = "expression", warnings = FALSE)
multi2
## Object of class 'MultiDataSet'
## . assayData: 1 elements
## . expression: 21916 features, 61 samples
## . featureData:
## . expression: 21916 rows, 4 cols (chr, ..., end)
## . rowRanges:
## . expression: YES
## . phenoData:
## . expression: 61 samples, 3 cols (gender, ..., inv)
multi2 <- add_eset(multi2, eset2, dataset.type = "expression", warnings = FALSE, overwrite = FALSE)
## Error in add_eset(multi2, eset2, dataset.type = "expression", warnings = FALSE, : There is already an object in this slot. Set overwrite = TRUE to overwrite the previous set.
multi2
## Object of class 'MultiDataSet'
## . assayData: 1 elements
## . expression: 21916 features, 61 samples
## . featureData:
## . expression: 21916 rows, 4 cols (chr, ..., end)
## . rowRanges:
## . expression: YES
## . phenoData:
## . expression: 61 samples, 3 cols (gender, ..., inv)
multi2 <- add_eset(multi2, eset2, dataset.type = "expression", warnings = FALSE, overwrite = TRUE)
multi2
## Object of class 'MultiDataSet'
## . assayData: 1 elements
## . expression: 21916 features, 10 samples
## . featureData:
## . expression: 21916 rows, 4 cols (chr, ..., end)
## . rowRanges:
## . expression: YES
## . phenoData:
## . expression: 10 samples, 3 cols (gender, ..., inv)
Finally, GRanges
argument is used add a GenomicRanges
with the annotation. By default, a GenomicRanges
will be generated from the set’s fData
. With this parameter, we can directly supply a GenomicRanges
or, if the annotation of our dataset cannot be transformed to a GenomicRanges
(e.g. proteomic data), we can set this parameter to NA:
multi2 <- add_eset(multi, eset, dataset.type = "expression", warnings = FALSE, GRanges = NA)
multi2
## Object of class 'MultiDataSet'
## . assayData: 1 elements
## . expression: 21916 features, 61 samples
## . featureData:
## . expression: 21916 rows, 4 cols (chr, ..., end)
## . rowRanges:
## . expression: NO
## . phenoData:
## . expression: 61 samples, 3 cols (gender, ..., inv)
Now, we can see that rowRanges is NO for expression. The implications will be described in the Filtering by GenomicRanges
section.
SummarizedExperiment
s are added using add_rse
. Its arguments and behavior are very similar to those of add_eset
. The only difference is that there is GRanges
argument (annotation data is already in the form of a GenomicRanges
). To exemplify its use, a GenomicRatioSet
(a minfi class) will be created and added:
data("MsetEx")
MsetEx2 <- MsetEx[1:100, ] ### Subset the original set to speed up
GRSet <- mapToGenome(ratioConvert(MsetEx2)) ## Convert MethylSet (eSet-derived) to GenomicRatioSet
GRSet
## class: GenomicRatioSet
## dim: 100 6
## metadata(0):
## assays(3): Beta M CN
## rownames(100): cg00011200 cg00014152 ... cg26983430 cg08265308
## rowData names(0):
## colnames(6): 5723646052_R02C02 5723646052_R04C01 ...
## 5723646053_R05C02 5723646053_R06C02
## colData names(13): Sample_Name Sample_Well ... Basename filenames
## Annotation
## array: IlluminaHumanMethylation450k
## annotation: ilmn12.hg19
## Preprocessing
## Method: Raw (no normalization or bg correction)
## minfi version: 1.21.2
## Manifest version: 0.4.0
multi <- createMultiDataSet()
multi2 <- add_rse(multi, GRSet, dataset.type = "methylation", warnings = FALSE)
## Warning in add_rse(multi, GRSet, dataset.type = "methylation", warnings
## = FALSE): No id column found in rowRanges. The id will be equal to the
## sampleNames
multi2
## Object of class 'MultiDataSet'
## . assayData: 1 elements
## . methylation: 100 features, 6 samples
## . featureData:
## . methylation: 100 rows, 5 cols (seqnames, ..., width)
## . rowRanges:
## . methylation: YES
## . phenoData:
## . methylation: 6 samples, 14 cols (Sample_Name, ..., filenames)
Specific functions are designed to add specific datasets to MultiDataSet
. They call general functions to add the data and they usually perform several checks (e.g: checking the class of the set or checking fData’s columns). As a result, only sets with some features can be introduced to MultiDataSet
and no later checks on data structure are required. Specific functions should always be used by users to ensure that the sets are properly added to MultiDataSet
.
In MultiDataSet
we have introduced four specific functions: add_genexp
, add_rnaseq
, add_methy
and add_snps
. All these functions has two arguments: object
with the MultiDataSet
and a second argument with the incoming set. The name of the second argument depends on the specific function (e.g: gexpSet for add_genexp
, snpSet for add_snps
…). Despite we will only show examples of add_genexp
and add_snps
, the other specific functions share the same behavior and features.
add_genexp
adds an ExpressionSet
to the slot “expression”. We will use the ExpressionSet
of MEALData as example:
multi <- createMultiDataSet()
multi2 <- add_genexp(multi, eset)
## Error in add_genexp(multi, eset): fData of gexpSet must contain columns chromosome, start and end
eset
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 21916 features, 61 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: GM00016 GM00022 ... WG3466 (61 total)
## varLabels: gender inv
## varMetadata: labelDescription
## featureData
## featureNames: ILMN_1343291 ILMN_1651209 ... ILMN_2415949 (21916
## total)
## fvarLabels: chr start end genes
## fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: GPL6883
add_genexp
requires that the fData of the incoming ExpressionSet
has the columns chromosome
, start
and end
. Our eset
object has the columns start
and end
but the chromosome is labeled with chr
. We should fix it before continue:
fvarLabels(eset)[1] <- "chromosome"
multi <- createMultiDataSet()
multi2 <- add_genexp(multi, eset)
## Warning in add_eset(object, gexpSet, dataset.type = "expression", GRanges
## = range, : No id column found in pData. The id will be equal to the
## sampleNames
multi2
## Object of class 'MultiDataSet'
## . assayData: 1 elements
## . expression: 21916 features, 61 samples
## . featureData:
## . expression: 21916 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . expression: YES
## . phenoData:
## . expression: 61 samples, 3 cols (gender, ..., inv)
Given that add_genexp
calls add_eset
, the arguments of add_eset
can also be used but dataset.type
and GRanges
. Let’s add the same ExpressionSet
to another slot using dataset.name
:
multi2 <- add_genexp(multi2, eset, dataset.name = "2")
## Warning in add_eset(object, gexpSet, dataset.type = "expression", GRanges
## = range, : No id column found in pData. The id will be equal to the
## sampleNames
multi2
## Object of class 'MultiDataSet'
## . assayData: 2 elements
## . expression: 21916 features, 61 samples
## . expression+2: 21916 features, 61 samples
## . featureData:
## . expression: 21916 rows, 4 cols (chromosome, ..., end)
## . expression+2: 21916 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . expression: YES
## . expression+2: YES
## . phenoData:
## . expression: 61 samples, 3 cols (gender, ..., inv)
## . expression+2: 61 samples, 3 cols (gender, ..., inv)
add_snps
adds a SnpSet
to the slot “snps” of a MultiDataSet
. Snps data is in the form of SnpMatrix
in MEALData, so we need to convert it first to SnpSet
:
data(snps)
SnpSet <- new("SnpSet", call = snps$genotypes)
fData(SnpSet) <- snps$map
SnpSet
## SnpSet (storageMode: lockedEnvironment)
## assayData: 29909 features, 98 samples
## element names: call, callProbability
## protocolData: none
## phenoData: none
## featureData
## featureNames: rs1000068-127_B_R_1501589836
## rs1000299-127_B_R_1501589728 ... rs999978-126_B_R_1502301346
## (29909 total)
## fvarLabels: Chromosome snp.name ... chromosome (5 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
If we try now to add this set to a MultiDataSet
:
multi2 <- add_snps(multi, SnpSet)
## Error in .find_seqnames_col(df_colnames0, seqnames.field0, prefix): cannnot determine seqnames column unambiguously
This error rises when makeGRangesFromDataFrame
is unable to create a GenomicRanges
from the fData
. If we take a look to SnpSet
’s fData
:
head(fData(SnpSet))
## Chromosome snp.name position SNP
## rs1000068-127_B_R_1501589836 17 rs1000068 45653965 [T/C]
## rs1000299-127_B_R_1501589728 17 rs1000299 70660148 [T/C]
## rs100043-127_T_F_1501589788 17 rs100043 64903636 [A/G]
## rs1000465-127_T_R_1501589734 17 rs1000465 67315267 [A/G]
## rs1000466-127_B_F_1501590038 17 rs1000466 67315325 [T/C]
## rs1000724-126_B_R_1502063163 17 rs1000724 14579993 [T/C]
## chromosome
## rs1000068-127_B_R_1501589836 chr17
## rs1000299-127_B_R_1501589728 chr17
## rs100043-127_T_F_1501589788 chr17
## rs1000465-127_T_R_1501589734 chr17
## rs1000466-127_B_F_1501590038 chr17
## rs1000724-126_B_R_1502063163 chr17
There are two columns that can be used as chromosome. To simplify it, we will remove the first column of fData
and we will try again:
fData(SnpSet) <- fData(SnpSet)[, -1]
multi2 <- add_snps(multi, SnpSet)
## Warning in add_eset(object, snpSet, dataset.type = "snps", GRanges
## = range, : No id column found in pData. The id will be equal to the
## sampleNames
multi2
## Object of class 'MultiDataSet'
## . assayData: 1 elements
## . snps: 29909 features, 98 samples
## . featureData:
## . snps: 29909 rows, 4 cols (snp.name, ..., SNP)
## . rowRanges:
## . snps: YES
## . phenoData:
## . snps: 98 samples, 1 cols (id)
Now, the method has worked and the set has been successfully added. This case exemplifies a checking in the structure of the fData
of the incoming set by a specific function.
Subsetting of MultiDataSet
s can be done by samples, by tables or using a GenomicRanges
. In order to illustrate these operations, we will use the ExpressionSet
and the MethylationSet
of MEALData. First, we will add these sets to a MultiDataSet
. The ExpressionSet
will be added to another slot but setting GRanges = NA:
data(mset)
multi <- createMultiDataSet()
# Remove probes without a position before adding the object
multi <- add_methy(multi, mset)
## Warning in add_eset(object, methySet, dataset.type = "methylation",
## GRanges = range, : No id column found in pData. The id will be equal to the
## sampleNames
multi <- add_genexp(multi, eset)
## Warning in add_eset(object, gexpSet, dataset.type = "expression", GRanges
## = range, : No id column found in pData. The id will be equal to the
## sampleNames
multi <- add_eset(multi, eset, dataset.type = "test", GRanges = NA)
## Warning in add_eset(multi, eset, dataset.type = "test", GRanges = NA): No
## id column found in pData. The id will be equal to the sampleNames
multi
## Object of class 'MultiDataSet'
## . assayData: 3 elements
## . methylation: 451383 features, 61 samples
## . expression: 21916 features, 61 samples
## . test: 21916 features, 61 samples
## . featureData:
## . methylation: 451383 rows, 17 cols (chromosome, ..., Regulatory_Feature_Group)
## . expression: 21916 rows, 4 cols (chromosome, ..., end)
## . test: 21916 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . methylation: YES
## . expression: YES
## . test: NO
## . phenoData:
## . methylation: 61 samples, 4 cols (gender, ..., inv)
## . expression: 61 samples, 3 cols (gender, ..., inv)
## . test: 61 samples, 3 cols (gender, ..., inv)
The expression data contains 61 samples and 21916 features and the methylation data 61 samples and 451383 CpGs.
Subsetting by samples can be done in two different ways. The first option is to introduce a vector of sample ids. MultiDataSet
has the operator [
overloaded and samples are the first element:
samples <- sampleNames(eset)[1:25]
multi[samples, ]
## Object of class 'MultiDataSet'
## . assayData: 3 elements
## . methylation: 451383 features, 23 samples
## . expression: 21916 features, 25 samples
## . test: 21916 features, 25 samples
## . featureData:
## . methylation: 451383 rows, 17 cols (chromosome, ..., Regulatory_Feature_Group)
## . expression: 21916 rows, 4 cols (chromosome, ..., end)
## . test: 21916 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . methylation: YES
## . expression: YES
## . test: NO
## . phenoData:
## . methylation: 23 samples, 4 cols (gender, ..., inv)
## . expression: 25 samples, 3 cols (gender, ..., inv)
## . test: 25 samples, 3 cols (gender, ..., inv)
Samples’ subsetting returns, for each set, all the samples that are present in the filtering vector. In our example, we selected the first 25 samples of the ExpressionSet
. In the MethylationSet
, only 23 of these samples were present.
We can also select only those samples that are present in all the datasets with the function commonSamples
. This method returns a new MultiDataSet
but only with the common samples:
commonSamples(multi)
## Object of class 'MultiDataSet'
## . assayData: 3 elements
## . methylation: 451383 features, 57 samples
## . expression: 21916 features, 57 samples
## . test: 21916 features, 57 samples
## . featureData:
## . methylation: 451383 rows, 17 cols (chromosome, ..., Regulatory_Feature_Group)
## . expression: 21916 rows, 4 cols (chromosome, ..., end)
## . test: 21916 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . methylation: YES
## . expression: YES
## . test: NO
## . phenoData:
## . methylation: 57 samples, 4 cols (gender, ..., inv)
## . expression: 57 samples, 3 cols (gender, ..., inv)
## . test: 57 samples, 3 cols (gender, ..., inv)
length(intersect(sampleNames(eset), sampleNames(mset)))
## [1] 57
The resulting MultiDataSet
contains 57 samples for expression and methylation, the same that the intersection between the sample names of the original sets.
We can select the datasets of a MultiDataSet
using their names. They should be placed in the second position of [
:
multi[, "expression"]
## Object of class 'MultiDataSet'
## . assayData: 1 elements
## . expression: 21916 features, 61 samples
## . featureData:
## . expression: 21916 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . expression: YES
## . phenoData:
## . expression: 61 samples, 3 cols (gender, ..., inv)
multi[, c("methylation", "test")]
## Object of class 'MultiDataSet'
## . assayData: 2 elements
## . methylation: 451383 features, 61 samples
## . test: 21916 features, 61 samples
## . featureData:
## . methylation: 451383 rows, 17 cols (chromosome, ..., Regulatory_Feature_Group)
## . test: 21916 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . methylation: YES
## . test: NO
## . phenoData:
## . methylation: 61 samples, 4 cols (gender, ..., inv)
## . test: 61 samples, 3 cols (gender, ..., inv)
If we want to retrieve the original object, we can set drop = TRUE
or use the [[
operator:
multi[["expression"]]
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 21916 features, 61 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: GM00016 GM00022 ... WG3466 (61 total)
## varLabels: gender inv id
## varMetadata: labelDescription
## featureData
## featureNames: ILMN_1343291 ILMN_1651209 ... ILMN_2415949 (21916
## total)
## fvarLabels: chromosome start end genes
## fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
multi[, "expression", drop = TRUE]
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 21916 features, 61 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: GM00016 GM00022 ... WG3466 (61 total)
## varLabels: gender inv id
## varMetadata: labelDescription
## featureData
## featureNames: ILMN_1343291 ILMN_1651209 ... ILMN_2415949 (21916
## total)
## fvarLabels: chromosome start end genes
## fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
Finally, MultiDataSet
can be filtered by GenomicRanges
. In this case, only those features inside the range will be returned and those datasets without GenomicRanges
data will be discarded. The GenomicRanges should be placed in the third position of [
:
range <- GRanges("chr17:1-100000")
multi[, , range]
## Object of class 'MultiDataSet'
## . assayData: 2 elements
## . methylation: 63 features, 61 samples
## . expression: 3 features, 61 samples
## . featureData:
## . methylation: 63 rows, 17 cols (chromosome, ..., Regulatory_Feature_Group)
## . expression: 3 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . methylation: YES
## . expression: YES
## . phenoData:
## . methylation: 61 samples, 4 cols (gender, ..., inv)
## . expression: 61 samples, 3 cols (gender, ..., inv)
As a consequence of filtering by GenomicRanges
, the set “test” that did not have rowRanges have been discarded. If the GenomicRanges
contains more than one range, features present in any of the ranges are selected:
range2 <- GRanges(c("chr17:1-100000", "chr17:1000000-2000000"))
multi[, , range2]
## Object of class 'MultiDataSet'
## . assayData: 2 elements
## . methylation: 766 features, 61 samples
## . expression: 27 features, 61 samples
## . featureData:
## . methylation: 766 rows, 17 cols (chromosome, ..., Regulatory_Feature_Group)
## . expression: 27 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . methylation: YES
## . expression: YES
## . phenoData:
## . methylation: 61 samples, 4 cols (gender, ..., inv)
## . expression: 61 samples, 3 cols (gender, ..., inv)
These three operations can be combined to apply the three filters. In this case, first the sets are selected, then the samples and lastly the features:
multi[samples, "expression", range]
## Object of class 'MultiDataSet'
## . assayData: 1 elements
## . expression: 3 features, 25 samples
## . featureData:
## . expression: 3 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . expression: YES
## . phenoData:
## . expression: 25 samples, 3 cols (gender, ..., inv)
multi[samples, "methylation", range, drop = TRUE]
## MethylationSet (storageMode: lockedEnvironment)
## assayData: 63 features, 23 samples
## element names: meth
## protocolData: none
## phenoData
## sampleNames: GM00016 GM00022 ... GM02317 (23 total)
## varLabels: gender source inv id
## varMetadata: labelDescription
## featureData
## featureNames: cg09002677 cg21726327 ... cg11597080 (63 total)
## fvarLabels: chromosome position ... DHS (17 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
The base R function subset
can be used to perform advanced subsetting. This function can be used to filter the features by a column each dataset feature data. For instance, we can use this function to select all the features associated to a gene:
subset(multi, genes == "SLC35E2")
## Object of class 'MultiDataSet'
## . assayData: 3 elements
## . methylation: 13 features, 61 samples
## . expression: 1 features, 61 samples
## . test: 1 features, 61 samples
## . featureData:
## . methylation: 13 rows, 17 cols (chromosome, ..., Regulatory_Feature_Group)
## . expression: 1 rows, 4 cols (chromosome, ..., end)
## . test: 1 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . methylation: YES
## . expression: YES
## . test: NO
## . phenoData:
## . methylation: 61 samples, 4 cols (gender, ..., inv)
## . expression: 61 samples, 3 cols (gender, ..., inv)
## . test: 61 samples, 3 cols (gender, ..., inv)
This line returns a MultiDataSet
with the features associated to the gene SLC35E2. The expression uses genes
because it is a column that is common to the datasets include in multi
. This function accepts any expression that returns a logical. Therefore, we can also use the %in%
operator or include more than one expression:
subset(multi, genes %in% c("SLC35E2", "IPO13", "TRPV1"))
## Object of class 'MultiDataSet'
## . assayData: 3 elements
## . methylation: 31 features, 61 samples
## . expression: 7 features, 61 samples
## . test: 7 features, 61 samples
## . featureData:
## . methylation: 31 rows, 17 cols (chromosome, ..., Regulatory_Feature_Group)
## . expression: 7 rows, 4 cols (chromosome, ..., end)
## . test: 7 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . methylation: YES
## . expression: YES
## . test: NO
## . phenoData:
## . methylation: 61 samples, 4 cols (gender, ..., inv)
## . expression: 61 samples, 3 cols (gender, ..., inv)
## . test: 61 samples, 3 cols (gender, ..., inv)
subset(multi, genes == "EEF1A1" | genes == "LPP")
## Object of class 'MultiDataSet'
## . assayData: 3 elements
## . methylation: 38 features, 61 samples
## . expression: 3 features, 61 samples
## . test: 3 features, 61 samples
## . featureData:
## . methylation: 38 rows, 17 cols (chromosome, ..., Regulatory_Feature_Group)
## . expression: 3 rows, 4 cols (chromosome, ..., end)
## . test: 3 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . methylation: YES
## . expression: YES
## . test: NO
## . phenoData:
## . methylation: 61 samples, 4 cols (gender, ..., inv)
## . expression: 61 samples, 3 cols (gender, ..., inv)
## . test: 61 samples, 3 cols (gender, ..., inv)
A similar approach can be used for selection samples with a common phenotype. In this case, we should pass the expression in the third argument and the column must also be present in the phenodata of the datasets:
subset(multi, , gender == "female")
## Object of class 'MultiDataSet'
## . assayData: 3 elements
## . methylation: 451383 features, 31 samples
## . expression: 21916 features, 31 samples
## . test: 21916 features, 31 samples
## . featureData:
## . methylation: 451383 rows, 17 cols (chromosome, ..., Regulatory_Feature_Group)
## . expression: 21916 rows, 4 cols (chromosome, ..., end)
## . test: 21916 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . methylation: YES
## . expression: YES
## . test: NO
## . phenoData:
## . methylation: 31 samples, 4 cols (gender, ..., inv)
## . expression: 31 samples, 3 cols (gender, ..., inv)
## . test: 31 samples, 3 cols (gender, ..., inv)
With this line of code, we can select all the women of the study. Both subsetting can be applied at the same time:
subset(multi, genes == "SLC35E2", gender == "female")
## Object of class 'MultiDataSet'
## . assayData: 3 elements
## . methylation: 13 features, 31 samples
## . expression: 1 features, 31 samples
## . test: 1 features, 31 samples
## . featureData:
## . methylation: 13 rows, 17 cols (chromosome, ..., Regulatory_Feature_Group)
## . expression: 1 rows, 4 cols (chromosome, ..., end)
## . test: 1 rows, 4 cols (chromosome, ..., end)
## . rowRanges:
## . methylation: YES
## . expression: YES
## . test: NO
## . phenoData:
## . methylation: 31 samples, 4 cols (gender, ..., inv)
## . expression: 31 samples, 3 cols (gender, ..., inv)
## . test: 31 samples, 3 cols (gender, ..., inv)