MetNet
, a novel R
package, that is compatible with the output of the xcms
/CAMERA
suite and that uses the data-rich output of mass spectrometry metabolomics to putatively link features on their relation to other features in the data set. MetNet
uses both structural and quantitative information of metabolomics data for network inference that will guide metabolite annotation.
MetNet 1.12.2
Among the main challenges in mass spectrometric metabolomic analysis is the
high-throughput analysis of metabolic features, their fast detection and
annotation.
By contrast to the screening of known, previously characterized,
metabolic features in these data, the putative annotation of unknown
features is often cumbersome and requires a lot of manual work, hindering
the biological information retrieval of these data.
High-resolution mass spectrometric data is often very rich in information
content and metabolic conversions, and reactions can be derived from structural
properties of features (Breitling et al. 2006).
In addition to that, statistical associations between
features (based on their intensity values) can be a valuable resource to find
co-synthesized or co-regulated metabolites, which are synthesized in the same
biosynthetic pathways. Given that an analysis tool within the R
framework
is still lacking that is
integrating the two features of mass spectrometric information commonly
acquired with mass spectrometers (m/z and intensity values), I developed
MetNet
to close this gap.
The MetNet
package comprises functionalities to infer network
topologies from high-resolution mass spectrometry data. MetNet
combines information from both structural data (differences in m/z values
of features) and statistical associations (intensity values of features per
sample) to propose putative metabolic networks that can be used for further
exploration.
The idea of using high-resolution mass spectrometry data for network construction was first proposed in Breitling et al. (2006) and followed soon afterwards by a Cytoscape plugin, MetaNetter (Jourdan et al. 2007), that is based on the inference of metabolic networks on molecular weight differences and correlation (Pearson correlation and partial correlation).
Inspired by the paper of Marbach et al. (2012) different algorithms for network
were implemented in MetNet
to account for
biases that are inherent in these statistical methods, followed by the
calculation of a consensus adjacency matrix using the differently computed
individual adjacency matrices.
The two main functionalities of the package include the creation of adjacency matrices from structural properties, based on losses/addition of functional groups defined by the user, and statistical associations. Currently, the following statistical models are implemented to infer a statistical adjacency matrix: Least absolute shrinkage and selection operator (LASSO, L1-norm regression, (Tibshirani 1994)), Random Forest (Breiman 2001), Pearson and Spearman correlation (including partial and semipartial correlation, see Steuer (2006) for a discussion on correlation-based metabolic networks), context likelihood of relatedness (CLR, (Faith et al. 2007)), the algorithm for the reconstruction of accurate cellular networks (ARACNE, (Margolin et al. 2006)) and constraint-based structure learning (Bayes, (Scutari 2010)). Since all of these methods have advantages and disadvantages, the user has the possibility to select several of these methods, compute adjacency matrices from these models and create a consensus matrix from the different statistical frameworks.
After creating the statistical and structural adjacency matrices these two matrices can be combined to form a consensus matrix that has information from both structural and statistical properties of the data. This can be followed by network analyses (e.g. calculation of topological parameters), integration with other data sources (e.g. genomic information or transcriptomic data) and/or visualization.
Central to MetNet
is the AdjacencyMatrix
class, derived from the
SummarizedExperiment
S4 class. The AdjacencyMatrix
host the adjacency
matrices creates during the different steps within the assays
slot. They
will furthermore store information on the type
of the AdjacencyMatrix
,
i.e. if it was derived from structural
or statistical
properties or if
it used the combined information from these layers (combine
). It also
stores information if the information was thresholded
, e.g. by
applying the rtCorrection
or threshold
function. Furthermore, the
AdjacencyMatrix
object stores information on if the graphs are directed
or undirected (within the directed
slot).
MetNet
is currently under active development. If you
discover any bugs, typos or develop ideas of improving
MetNet
feel free to raise an issue via
Github or
send a mail to the developer.
To install MetNet
enter the following to the R
console
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MetNet")
Before starting with the analysis, load the MetNet
package. This
will also load the required packages glmnet
, stabs
, GENIE3
, mpmi
,
parmigene
, Hmisc
, ppcor
and bnlearn
that are needed
for functions in the statistical adjacency matrix inference.
library(MetNet)
The data format that is compatible with the MetNet
framework is
a xcms
/CAMERA
output-like \(m~\times~n\) matrix, where
columns denote the different samples \(n\) and where \(m\) features are present.
In such a matrix, information about the masses of the features and quantitative
information of the features (intensity or concentration values) are needed.
The information about the m/z values has to be stored in a vector of
length \(\vert m \vert\) in the column "mz"
.
MetNet
does not impose any requirements for
data normalization, filtering, etc. However, the user has to make sure that
the data is properly preprocessed. These include division by internal standard,
log2
or vsn
transformation, noise filtering, removal of features that do not
represent mass features/metabolites, removal of isotopes, etc.
We will load here the object x_test
that contains m/z values
(in the column "mz"
), together with the corresponding retention time
(in the column "rt"
) and intensity values. We will use here the object
x_test
for guidance through the workflow of MetNet
.
data("x_test", package = "MetNet")
x_test <- as.matrix(x_test)
The function structural
will create an AdjacencyMatrix
object of
type
structural
containing the adjacency
matrices based on structural properties (m/z values) of the features.
The function expects a matrix with a column "mz"
that contains the
mass information of a feature (typically the m/z value). Furthermore,
structural
takes a data.frame
object as argument transformation
with the colnames
"mass"
and additional columns (e.g. "group"
, "formula"
or "rt"
).
structural
looks for transformations (in the
sense of additions/losses of functional groups mediated by biochemical,
enzymatic reactions) in the data using the mass information.
Following the work of Breitling et al. (2006) and Jourdan et al. (2007), molecular weight difference wX is defined by \(w_X = \vert w_A - w_B \vert\)
where wA is the molecular weight of substrate A, and wB is the molecular weight of product B (typically, m/z values will be used as a proxy for the molecular weight since the molecular weight is not directly derivable from mass spectrometric data). As exemplified in Jourdan et al. (2007), specific enzymatic reactions refer to specific changes in the molecular weight, e.g. carboxylation reactions will result in a mass difference of 43.98983 (molecular weight of CO2) between metabolic features.
The search space for these transformation is adjustable by the
transformation
argument in
structural
allowing to look for specific
enzymatic transformations. Hereby,
structural
will take into account the
ppm
value, to adjust for inaccuracies in m/z values due to technical
reasons according to the formula
\[ppm = \frac{m_{exp} - m_{calc}}{m_{exp}} \cdot 10^{-6}\]
with mexp the experimentally determined m/z value and mcalc the
calculated accurate mass of a molecule. Within the function, a lower and upper
range is calculated depending on the supplied ppm
value, differences
between the m/z feature values are calculated and matched against the
"mass"
es of the transformation
argument. If any
of the additions/losses defined in transformation
is found in the
data, it will be reported as an (unweighted) connection in the assay
"binary"
of the returned AdjacencyMatrix
object.
Together with this assay, additional character
adjacency matrices can be
written to the assay slot of the AdjacencyMatri
object.
E.g. we can write the type of
connection/transformation (derived e.g. from the column "group"
in the
transformation
object) as a character matrix to the
assay "group"
by setting var = "group"
.
Before creating the structural
AdjacencyMatrix
, one must define the
search space, i.e. the transformation that will be looked for in the mass spectrometric
data, by creating here the transformations
object.
## define the search space for biochemical transformation
transformations <- rbind(
c("Hydroxylation (-H)", "O", 15.9949146221, "-"),
c("Malonyl group (-H2O)", "C3H2O3", 86.0003939305, "+"),
c("D-ribose (-H2O) (ribosylation)", "C5H8O4", 132.0422587452, "-"),
c("C6H10O6", "C6H10O6", 178.0477380536, "-"),
c("Rhamnose (-H20)", "C6H10O4", 146.057910, "-"),
c("Monosaccharide (-H2O)", "C6H10O5", 162.0528234315, "-"),
c("Disaccharide (-H2O) #1", "C12H20O10", 324.105649, "-"),
c("Disaccharide (-H2O) #2", "C12H20O11", 340.1005614851, "-"),
c("Trisaccharide (-H2O)", "C18H30O15", 486.1584702945, "-"),
c("Glucuronic acid (-H2O)", "C6H8O6", 176.0320879894, "?"),
c("coumaroyl (-H2O)", "C9H6O2", 146.0367794368, "?"),
c("feruloyl (-H2O)", "C9H6O2OCH2", 176.0473441231, "?"),
c("sinapoyl (-H2O)", "C9H6O2OCH2OCH2", 206.0579088094, "?"),
c("putrescine to spermidine (+C3H7N)", "C3H7N", 57.0578492299, "?"))
## convert to data frame
transformations <- data.frame(
group = transformations[, 1],
formula = transformations[, 2],
mass = as.numeric(transformations[, 3]),
rt = transformations[, 4])
The function structural
will then check for those
m/z differences that are stored in the column "mass"
in the
object transformations
. To create the AdjacencyMatrix
object derived
from these structural information we enter
struct_adj <- structural(x = x_test, transformation = transformations,
var = c("group", "formula", "mass"), ppm = 10)
in the R
console.
As we set var = c("group", "formula", "mass")
, the AdjacencyMatrix
object
will contain the assays "group"
, "formula"
, and "mass"
that store the
character
adjacency matrices with the information defined in
the columns of transformations
.
By default, the structural
AdjacencyMatrix
object and the contained
adjacency matrices are undirected (the
argument in structural
is set to directed = FALSE
by default; i.e. the
matrices are symmetric). MetNet
,
however, also allows to include the information on the directionality of
the transformation (e.g. to distinguish between additions and losses).
This behaviour can be specified by setting directed = TRUE
:
struct_adj_dir <- structural(x = x_test, transformation = transformations,
var = c("group", "formula", "mass"), ppm = 10, directed = TRUE)
In the following we will visualize the results from the undirected and directed structural network.
We will set the mode of the igraph
object
to "directed"
in both cases to make the distinction between the returned
outputs of structural
for setting directed = FALSE
and directed = TRUE
.
Alternatively, we could also set the mode
for the first igraph
object
(using the undirected output of structural
) to "undirected"
which results
in an igraph
object where the directionality of the edges is not retained.
g_undirected <- igraph::graph_from_adjacency_matrix(
assay(struct_adj, "binary"), mode = "directed", weighted = NULL)
plot(g_undirected, edge.width = 1, edge.arrow.size = 0.5,
vertex.label.cex = 0.5, edge.color = "grey")
g_directed <- igraph::graph_from_adjacency_matrix(
assay(struct_adj_dir, "binary"), mode = "directed", weighted = NULL)
plot(g_directed, edge.width = 1, edge.arrow.size = 0.5,
vertex.label.cex = 0.5, edge.color = "grey")
The retention time will differ depending
on the chemical group added, e.g. an addition of a glycosyl group will
usually result in a lower retentiom time in reverse-phase chromatography.
This information can be used in refining the adjacency matrix derived from
the structural matrix. The rtCorrection
function does this check, if the
predicted transformations correspond to the expected retention time shift,
in an automated fashion. It requires information about the expected retention
time shift in the data.frame
passed to the transformation
argument (in the "rt"
column). Within this column, information about
retention time shifts is encoded by "-"
, "+"
and "?"
,
which means the feature with higher m/z value has lower, higher or unknown
retention time than the feature with the lower m/z value. The values for
m/z and retention time will be taken from the object passed to the
x
argument. In case there is a discrepancy between the transformation
and the retention time shift, the adjacency matrix at the specific position
will be set to 0. rtCorrection
will return the an AdjacencyMatrix
object
with updated adjacency matrices ("binary"
and the additional
character
adjacecny matrices).
To account for retention time shifts we enter
struct_adj <- rtCorrection(am = struct_adj, x = x_test,
transformation = transformations, var = "group")
in the R
console. The character "group"
defined with var
will serve
here as the link between the assay "group"
and the column in transformation
to calculate the retention time discrepancies between feature pairs.
For data analysis a data.frame
can be generated from AdjacencyMatrix
objects
by applying as.data.frame()
. Further filtering displays only feature-pairs
which were matched to a transformation.
struct_df <- as.data.frame(struct_adj)
struct_df <- struct_df[struct_df$binary == 1, ]
Some overview on the mass-difference distribution of the data can be observed
using the mz_summary
function. The number of determined mass differences
can be displayed by using the mz_vis
function.
mz_sum <- mz_summary(struct_adj, var = "group")
mz_vis(mz_sum, var = "group")
For larger data-sets, also a filter
can be
applied to visualize mass-difference above a defined threshold.
A filter can be applied, by filter
. Since the maximum count of any mass difference in struct_adj
is 4, a filter of 5
results in 0 mass differences.
mz_summary(struct_adj, filter = 4)
## group formula count
## 1 Hydroxylation (-H) O 15
## 2 Malonyl group (-H2O) C3H2O3 11
## 3 Monosaccharide (-H2O) C6H10O5 6
## 4 Rhamnose (-H20) C6H10O4 9
## 6 feruloyl (-H2O) C9H6O2OCH2 4
## 7 putrescine to spermidine (+C3H7N) C3H7N 4
mz_summary(struct_adj, filter = 5)
## group formula count
## 1 Hydroxylation (-H) O 15
## 2 Malonyl group (-H2O) C3H2O3 11
## 3 Monosaccharide (-H2O) C6H10O5 6
## 4 Rhamnose (-H20) C6H10O4 9
The AdjacencyMatrix
class allows storing further information on the features
as putative annotations, database identifier, SMILES, etc. using rowData()
.
A data.frame
containing the same rownames
as the test data needs to be
provided. The columns can store different information as annotations, identifier, etc.
We will load the x_annotation
file, which contains an example annotation
and other identifier for feature x1856
. All the other features contain NA
s
in corresponding columns.
data("x_annotation", package = "MetNet")
x_annotation <- as.data.frame(x_annotation)
## add annotations to the structural AdjacencyMatrix object
rowData(struct_adj) <- x_annotation
## display annotation for the feature "1856"
rowData(struct_adj)["x1856", ]
## DataFrame with 1 row and 11 columns
## database_mz database_identifier chemical_formula smiles
## <character> <character> <character> <character>
## x1856 308.2 N-caffeoylspermidine C16H25N3O3 C=1(C=C(C(=CC1)O)O)/..
## inchi inchikey metabolite_identification
## <character> <character> <character>
## x1856 InChI=1S/C16H25N3O3/.. AZSUJBAOTYNFDE-FNORW.. NA
## fragmentations modifications charge database
## <character> <character> <character> <character>
## x1856 NA NA 1 NA
statistical
The function statistical
will create an AdjacencyMatrix
object of type
statistical
containing the adjacency
matrices based on statistical associations. The function will create
weighted adjacency matrices using the statistical models defined by the
model
argument. Currently, the following models are available:
LASSO (using stabs
,
(Hofner, Boccuto, and Göker 2015; Thomas et al. 2017)), Random Forest (using GENIE3
,
CLR, ARACNE (the two latter using the package mpmi
to calculate
Mutual Information using a nonparametric bias correction by
Bias Corrected Mutual Information, and the functions clr
and
aracne.a
from the parmigene
package), Pearson and
Spearman correlation (based on the
psych
package), partial and semipartial
Pearson and Spearman correlation (using the ppcor
package) and
score-based structure learning returning the strength of the probabilistic
relationships of the arcs of
a Bayesian network, as learned from bootstrapped data (using the
boot.strength
with the Tabu greedy search as default
from the bnlearn
package (Scutari 2010)).
For further information on the different models
take a look on the respective help pages of lasso
,
randomForest
, clr
, aracne
, correlation
and/or
bayes
. Arguments that are accepted by the respective underlying
functions can be passed directly to the statistical
function. In addition,
arguments that are defined in the functions lasso
,
randomForest
, clr
, aracne
, correlation
and/or
bayes
can be passed to the functions.
threshold
From the statistical
AdjacencyMatrix
object the function threshold
will create an AdjacencyMatrix
object with the derived
unweighted adjacency matrix from the weighted adjacency matrices
unifying the information present from all statistical models. This
unweighted adjacency matrix is stored in the assay "consensus"
.
In the following example, we will create a list of unweighted adjacency matrices using Pearson and Spearman correlation using the intensity values as input data.
x_int <- x_test[, 3:ncol(x_test)] |>
as.matrix()
stat_adj <- statistical(x_int, model = c("pearson", "spearman"))
## [1] "pearson finished."
## [1] "spearman finished."
The reasoning behind this step is to circumvent disadvantages arising from each
model and creating a statistically reliable topology that reflects the actual
metabolic relations. threshold
returns an unweighted adjacency
matrix with connections inferred from the respective models (in the
"consensus"
assay).
There are four different types implemented how the unweighted adjacency
matrix can be created: threshold
, top1
, top2
, mean
.
For type = "threshold"
, threshold values have to be defined in the
args
argument for the respective statistical model within the list entry
filter
. Values above or below
these thresholds in each respective weighted adjacency matrix will be
reported as present (1) or absent (0) in the returned unweighted adjacency
matrix.
For the other three types (top1
, top2
, mean
) the ranks per statistical model
will be calculated and from each respective link the top1, top2 or mean rank
across statistical models will be calculated (cf. (Hase et al. 2013)). The
top n unique ranks (defined by the entry n
in args
) will be returned
as links in the unweighted consensus adjacency matrix.
We will create here for all ways the thresholded AdjacencyMatrix
objects of
type statistical
containing the consensus adjacency matrix.
## type = "threshold"
## the assayNames in stat_adj are used to define the filter criteria
assayNames(stat_adj)
## [1] "pearson_coef" "pearson_pvalue" "spearman_coef" "spearman_pvalue"
## return edges with positive Pearson correlation coefficients > 0.95
args_thr <- list(filter = "pearson_coef > 0.95")
## return edges with positive Spearman correlation coefficients > 0.95
args_thr <- list(filter = "spearman_coef > 0.95")
## return edges with absolute Pearson correlation coefficients > 0.95 and
## associated p-values <= 0.05
args_thr <- list(filter = "abs(pearson_coef) > 0.95 & pearson_pvalue <= 0.05")
## return edges with absolute Pearson OR Spearman correlation coefficients > 0.95
args_thr <- list(filter = "abs(pearson_coef) > 0.95 | abs(spearman_coef) > 0.95")
## return edges with absolute Pearson AND Spearman correlation coefficients > 0.95
args_thr <- list(filter = "abs(pearson_coef) > 0.95 & abs(spearman_coef) > 0.95")
## pass the filtering criteria to the args argument and set type to "threshold"
stat_adj_thr <- threshold(am = stat_adj, type = "threshold",
args = args_thr)
## alternatively, use the types "top1", "top2", "mean"
## retrieve the feature pairs which have the 100 highest coefficients
args_top <- list(n = 100)
## type = "top1"
stat_adj_top1 <- threshold(am = stat_adj, type = "top1",
args = args_top)
## type = "top2"
stat_adj_top2 <- threshold(am = stat_adj, type = "top2",
args = args_top)
## type = "mean"
stat_adj_mean <- threshold(am = stat_adj, type = "mean",
args = args_top)
After creating the structural
and statistical
AdjacencyMatrix
objects,
the two objects are combined. The function combine
will combine these two objects and create an AdjacencyMatrix
object of type
combine
. The function accepts
the arguments am_structural
and am_statistical
for the respective
AdjacencyMatrix
objects. Please note that for am_structural
the
AdjacencyMatrix
obtained via structural
or rtCorrection
can be used,
while for am_statistical
the AdjacencyMatrix
from threshold
has to be
used.
The edges that are present both in the binary
assay and the consensus
assay
will be reported within the combine_binary
assay in a combine
AdjacencyMatrix
object. If there are additional assays in the
AdjacencyMatrix
of type structural
these matrices will be stored in the
AdjacencyMatrix
of type combine
and will contain the edges that are
supported by the combine_binary
adjacency matrix (the other edges are
set to "").
We will use here the thresholded statistical
AdjacencyMatrix
from
type = "mean"
to combine it with the structural
AdjacencyMatrix
,
stat_adj
:
comb_adj <- combine(am_structural = struct_adj, am_statistical = stat_adj_mean)
To display the created consensus adjacency matrix, existing visualization
tools available in the R
framework can be employed or any other visualization
tool after exporting the consensus matrix as a text file. In this example,
we will use the igraph
(Csardi and Nepusz 2006) package to visualize the
adjacency matrix.
We use here the assay "combine_binary"
from the AdjacencyMatrix
of type
combine
and pass it to the graph_from_adjacency_matrix
function:
adj <- assay(comb_adj, "combine_binary")
g <- igraph::graph_from_adjacency_matrix(adj, mode = "undirected")
plot(g, edge.width = 2, vertex.label.cex = 0.5, edge.color = "grey")
Furthermore, the network can be analysed by network analysis techniques
(topological parameters such as centrality, degree, clustering indices) that
are implemented in different packages in R
(e.g. igraph
or sna
) or other software tools outside of
the R
environment.
All software and respective versions to build this vignette are listed here:
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] MetNet_1.12.2 SummarizedExperiment_1.24.0
## [3] Biobase_2.54.0 GenomicRanges_1.46.1
## [5] GenomeInfoDb_1.30.1 IRanges_2.28.0
## [7] MatrixGenerics_1.6.0 matrixStats_0.61.0
## [9] S4Vectors_0.32.4 BiocGenerics_0.40.0
## [11] knitr_1.38 BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.1 tidyr_1.2.0 jsonlite_1.8.0
## [4] tmvnsim_1.0-2 bslib_0.3.1 assertthat_0.2.1
## [7] BiocManager_1.30.16 highr_0.9 GenomeInfoDbData_1.2.7
## [10] yaml_2.3.5 stabs_0.6-4 pillar_1.7.0
## [13] lattice_0.20-45 glue_1.6.2 digest_0.6.29
## [16] XVector_0.34.0 colorspace_2.0-3 htmltools_0.5.2
## [19] Matrix_1.4-1 psych_2.2.3 pkgconfig_2.0.3
## [22] magick_2.7.3 bookdown_0.25 zlibbioc_1.40.0
## [25] purrr_0.3.4 scales_1.1.1 GENIE3_1.16.0
## [28] BiocParallel_1.28.3 tibble_3.1.6 farver_2.1.0
## [31] generics_0.1.2 ggplot2_3.3.5 ellipsis_0.3.2
## [34] ppcor_1.1 cli_3.2.0 mnormt_2.0.2
## [37] magrittr_2.0.3 crayon_1.5.1 evaluate_0.15
## [40] fansi_1.0.3 nlme_3.1-157 MASS_7.3-56
## [43] tools_4.1.3 lifecycle_1.0.1 bnlearn_4.7.1
## [46] stringr_1.4.0 munsell_0.5.0 mpmi_0.43.2
## [49] DelayedArray_0.20.0 compiler_4.1.3 jquerylib_0.1.4
## [52] rlang_1.0.2 grid_4.1.3 RCurl_1.98-1.6
## [55] igraph_1.3.0 labeling_0.4.2 bitops_1.0-7
## [58] rmarkdown_2.13 gtable_0.3.0 DBI_1.1.2
## [61] R6_2.5.1 dplyr_1.0.8 fastmap_1.1.0
## [64] utf8_1.2.2 parmigene_1.1.0 KernSmooth_2.23-20
## [67] stringi_1.7.6 Rcpp_1.0.8.3 parallel_4.1.3
## [70] vctrs_0.4.0 tidyselect_1.1.2 xfun_0.30
The list of transformations is taken from Breitling et al. (2006). The numerical m/z values were calculated by using the structural formula and the Biological Magnetic Resonance Data Bank web tool.
transformations <- rbind(
c("Alanine", "C3H5NO", "71.0371137878"),
c("Arginine", "C6H12N4O", "156.1011110281"),
c("Asparagine", "C4H6N2O2", "114.0429274472"),
c("Guanosine 5-diphosphate (-H2O)", "C10H13N5O10P2", "425.0137646843"),
c("Guanosine 5-monophosphate (-H2O)", "C10H12N5O7P", "345.0474342759"),
c("Guanine (-H)", "C5H4N5O", "150.0415847765"),
c("Aspartic acid", "C4H5NO3", "115.0269430320"),
c("Guanosine (-H2O)", "C10H11N5O4", "265.0811038675"),
c("Cysteine", "C3H5NOS", "103.0091844778"),
c("Deoxythymidine 5'-diphosphate (-H2O)", "C10H14N2O10P2", "384.01236770"),
c("Cystine", "C6H10N2O3S2", "222.0132835777"),
c("Thymidine (-H2O)", "C10H12N2O4", "224.0797068840"),
c("Glutamic acid", "C5H7NO3", "129.0425930962"),
c("Thymine (-H)", "C5H5N2O2", "125.0351024151"),
c("Glutamine", "C5H8N2O2", "128.0585775114"),
c("Thymidine 5'-monophosphate (-H2O)", "C10H13N2O7P", "304.0460372924"),
c("Glycine", "C2H3NO", "57.0214637236"),
c("Uridine 5'-diphosphate (-H2O)", "C9H12N2O11P2", "385.9916322587"),
c("Histidine", "C6H7N3O", "137.0589118624"),
c("Uridine 5'-monophosphate (-H2O)", "C9H11N2O8P", "306.0253018503"),
c("Isoleucine", "C6H11NO", "113.0840639804"),
c("Uracil (-H)", "C4H3N2O2", "111.0194523509"),
c("Leucine", "C6H11NO", "113.0840639804"),
c("Uridine (-H2O)", "C9H10N2O5", "226.0589714419"),
c("Lysine", "C6H12N2O", "128.0949630177"),
c("Acetylation (-H)", "C2H3O2", "59.0133043405"),
c("Methionine", "C5H9NOS", "131.0404846062"),
c("Acetylation (-H2O)", "C2H2O", "42.0105646863"),
c("Phenylalanine", "C9H9NO", "147.0684139162"),
c("C2H2", "C2H2", "26.0156500642"),
c("Proline", "C5H7NO", "97.0527638520"),
c("Carboxylation", "CO2", "43.9898292442"),
c("Serine", "C3H5NO2", "87.0320284099"),
c("CHO2", "CHO2", "44.9976542763"),
c("Threonine", "C4H7NO2", "101.0476784741"),
c("Condensation/dehydration", "H2O", "18.0105646863"),
c("Tryptophan", "C11H10N2O", "186.0793129535"),
c("Diphosphate", "H3O6P2", "160.9404858489"),
c("Tyrosine", "C9H9NO2", "163.0633285383"),
c("Ethyl addition (-H2O)", "C2H4", "28.0313001284"),
c("Valine", "C5H9NO", "99.0684139162"),
c("Formic Acid (-H2O)", "CO", "27.9949146221"),
c("Acetotacetate (-H2O)", "C4H4O2", "84.0211293726"),
c("Glyoxylate (-H2O)", "C2O2", "55.9898292442"),
c("Acetone (-H)", "C3H5O", "57.0340397826"),
c("Hydrogenation/dehydrogenation", "H2", "2.0156500642"),
c("Adenylate (-H2O)", "C10H12N5O6P", "329.0525196538"),
c("Hydroxylation (-H)", "O", "15.9949146221"),
c("Biotinyl (-H)", "C10H15N2O3S", "243.0803380482"),
c("Inorganic phosphate", "P", "30.9737615100"),
c("Biotinyl (-H2O)", "C10H14N2O2S", "226.0775983940"),
c("Ketol group (-H2O)", "C2H2O", "42.0105646863"),
c("Carbamoyl P transfer (-H2PO4)", "CH2ON", "44.0136386915"),
c("Methanol (-H2O)", "CH2", "14.0156500642"),
c("Co-enzyme A (-H)", "C21H34N7O16P3S", "765.0995583014"),
c("Phosphate", "HPO3", "79.9663304084"),
c("Co-enzyme A (-H2O)", "C21H33N7O15P3S", "748.0968186472"),
c("Primary amine", "NH2", "16.0187240694"),
c("Glutathione (-H2O)", "C10H15N3O5S", "289.0732412976"),
c("Pyrophosphate", "PP", "61.9475230200"),
c("Isoprene addition (-H)", "C5H7", "67.0547752247"),
c("Secondary amine", "NH", "15.0108990373"),
c("Malonyl group (-H2O)", "C3H2O3", "86.0003939305"),
c("Sulfate (-H2O)", "SO3", "79.9568145563"),
c("Palmitoylation (-H2O)", "C16H30O", "238.2296655851"),
c("Tertiary amine", "N", "14.0030740052"),
c("Pyridoxal phosphate (-H2O)", "C8H8NO5P", "229.0140088825"),
c("C6H10O5", "C6H10O5", "162.0528234315"),
c("Urea addition (-H)", "CH3N2O", "59.0245377288"),
c("C6H10O6", "C6H10O6", "178.0477380536"),
c("Adenine (-H)", "C5H4N5", "134.0466701544"),
c("D-ribose (-H2O) (ribosylation)", "C5H8O4", "132.0422587452"),
c("Adenosine (-H2O)", "C10H11N5O3", "249.0861892454"),
c("Disaccharide (-H2O) #1", "C12H20O10", "324.105649"),
c("Disaccharide (-H2O) #2", "C12H20O11", "340.1005614851"),
c("Adenosine 5'-diphosphate (-H2O)", "C10H13N5O9P2", "409.0188500622"),
c("Glucose-N-phosphate (-H2O)", "C6H11O8P", "242.0191538399"),
c("Adenosine 5'-monophosphate (-H2O)", "C10H12N5O6P", "329.0525196538"),
c("Glucuronic acid (-H2O)", "C6H8O6", "176.0320879894"),
c("Cytidine 5'-diphosphate (-H2O)", "C9H13N3O10P2", "385.0076166739"),
c("Monosaccharide (-H2O)", "C6H10O5", "162.0528234315"),
c("Cytidine 5'-monophsophate (-H2O)", "C9H12N3O7P", "305.0412862655"),
c("Trisaccharide (-H2O)", "C18H30O15", "486.1584702945"),
c("Cytosine (-H)", "C4H4N3O", "110.0354367661"))
transformations <- data.frame(group = transformations[, 1],
formula = transformations[, 2],
mass = as.numeric(transformations[, 3]))
Breiman, L. 2001. “Random Forests.” Machine Learning 45: 5–32. https://doi.org/10.1023/A:1010933404324.
Breitling, R., S. Ritchie, D. Goodenowe, M.L. Stewart, and M.P. Barrett. 2006. “Ab Initio Prediction of Metabolic Networks Using Fourier Transform Mass Spectrometry Data.” Metabolomics 2: 155–64. https://doi.org/10.1007/s11306-006-0029-z.
Csardi, G., and T. Nepusz. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal Complex Systems: 1695. http://igraph.org.
Faith, J.J., B. Hayete, J.T. Thaden, I. Mogno, J. Wierzbowski, G. Cottarel, S. Kasif, J.J. Collins, and T.S. Gardner. 2007. “Large-Scale Mapping and Validation of Escherichia Coli Transcriptional Regulation from a Compendium of Expression Profiles.” Plos Biology 5: e8. https://doi.org/10.1371/journal.pbio.0050008.
Hase, T., S. Ghosh, R. Yamanaka, and H. Kitano. 2013. “Harnessing Diversity Towards the Reconstructing of Large Scale Gene Regulatory Networks.” PLoS Computational Biology 9: e1003361. https://doi.org/10.1371/journal.pcbi.1003361.
Hofner, B., L. Boccuto, and M. Göker. 2015. “Controlling False Discoveries in High-Dimensional Situations: Boosting with Stability Selection.” BMC Bioinformatics 16: 144. https://doi.org/10.1186/s12859-015-0575-3.
Jourdan, F., R. Breitling, M.P. Barrett, and D. Gilbert. 2007. “MetaNetter: Inference and Visualization of High-Resolution Metabolomic Networks.” Bioinformatics 24: 143–45. https://doi.org/10.1093/bioinformatics/btm536.
Marbach, D., J.C. Costello, R. Küffner, N.M. Vega, R.J. Prill, D.M. Camacho, K.R. Allison, et al. 2012. “Wisdom of Crowds for Robust Gene Network Inference.” Nature Methods 9: 796–804. https://doi.org/10.1038/nmeth.2016.
Margolin, A.A., I. Nemenman, K. Basso, C. Wiggins, G. Stolovitzky, R. Dalla Favera, and A. Califano. 2006. “ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context.” BMC Bioinformatics 7: S7. https://doi.org/10.1186/1471-2105-7-S1-S7.
Scutari, M. 2010. “Learning Bayesian Networks with the bnlearn R Package.” Journal of Statistical Software 35: 1–22. https://doi.org/10.18637/jss.v035.i03.
Steuer, R. 2006. “On the Analysis and Interpretation of Correlations in Metabolomic Data.” Briefings in Bioinformatics 7: 151–58. https://doi.org/10.1093/bib/bbl009.
Thomas, J., A. Mayr, B. Bischl, M. Schmid, A. Smith, and B. Hofner. 2017. “Gradient Boosting for Distributional Regression - Faster Tuning and Improved Variable Selection via Noncyclical Updates.” Statistics and Computing 28: 673–87. https://doi.org/10.1007/s11222-017-9754-6.
Tibshirani, R. 1994. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological) 58: 267–88.