The human metabolomics database (HMDB, http://www.hmdb.ca) includes XML documents describing 114000 metabolites. We will show how to manipulate the metadata on metabolites fairly flexibly.
The hmdbQuery package includes a function for querying HMDB directly over HTTP:
The result is parsed and encapsulated in an S4 object
## HMDB metabolite metadata for 1-Methylhistidine:
## There are 10 diseases annotated.
## Direct association reported for 4 biospecimens and 1 tissues.
## Use diseases(), biospecimens(), tissues() for more information.
The size of the complete import of information about a single metabolite suggests that it would not be too convenient to have comprehensive information about all HMDB constituents in memory. The most effective approach to managing the metadata will depend upon use cases to be developed over the long run.
Note however that this package does provide snapshots of certain direct associations derived from all available information as of Sept. 23 2017. Information about direct associations reported in the database is present in tables hmdb_disease
, hmdb_gene
, hmdb_protein
, hmdb_omim
. For example
## DataFrame with 75360 rows and 3 columns
## accession name disease
## <character> <character> <character>
## 1 HMDB0000001 1-Methylhistidine Alzheimer's disease
## 2 HMDB0000001 1-Methylhistidine Diabetes mellitus ty..
## 3 HMDB0000001 1-Methylhistidine Kidney disease
## 4 HMDB0000001 1-Methylhistidine Obesity
## 5 HMDB0000002 1,3-Diaminopropane Perillyl alcohol adm..
## ... ... ... ...
## 75356 HMDB0094706 Serylvaline NA
## 75357 HMDB0094708 Tetraethylene glycol NA
## 75358 HMDB0094712 Serylleucine NA
## 75359 HMDB0100002 TG(i-14:0/17:0/i-13:0) NA
## 75360 HMDB0101657 TG(15:0/i-14:0/a-21:.. NA
Some HMDB metabolites have been mapped to diseases.
## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
## lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
## tapply, union, unique, unsplit, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
## An object of class 'pubMedAbst':
## Title: Free amino acid and dipeptide changes in the body fluids from
## Alzheimer's disease subjects.
## PMID: 17031479
## Authors: AN Fonteh, RJ Harrington, A Tsai, P Liao, MG Harrington
## Journal: Amino Acids
## Date: Feb 2007
Note that pre HMDB v 4.0, biospecimens were called biofluids.
There are arbitrarily many biospecimen and tissue associations provided for each HMDB entry. We have direct accessors, and by default we capture all metadata, available through the store
method.
## [1] "Blood" "Cerebrospinal Fluid (CSF)"
## [3] "Feces" "Urine"
## [1] "Skeletal Muscle"
## [1] "version" "creation_date" "update_date"
## [4] "accession" "status" "secondary_accessions"
## [1] 46
## $protein
## [1] "Beta-Ala-His dipeptidase"
##
## $protein
## [1] "Protein arginine N-methyltransferase 3"
## $protein
## [1] "CNDP1"
##
## $protein
## [1] "PRMT3"
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] annotate_1.72.0 AnnotationDbi_1.56.2 IRanges_2.28.0
## [4] S4Vectors_0.32.3 Biobase_2.54.0 BiocGenerics_0.40.0
## [7] gwascat_2.26.0 hmdbQuery_1.14.2 XML_3.99-0.8
##
## loaded via a namespace (and not attached):
## [1] MatrixGenerics_1.6.0 httr_1.4.2
## [3] sass_0.4.0 splines_4.1.2
## [5] bit64_4.0.5 jsonlite_1.7.2
## [7] bslib_0.3.1 assertthat_0.2.1
## [9] BiocFileCache_2.2.0 blob_1.2.2
## [11] BSgenome_1.62.0 GenomeInfoDbData_1.2.7
## [13] Rsamtools_2.10.0 yaml_2.2.1
## [15] progress_1.2.2 pillar_1.6.4
## [17] RSQLite_2.2.9 lattice_0.20-45
## [19] glue_1.6.0 digest_0.6.29
## [21] GenomicRanges_1.46.1 XVector_0.34.0
## [23] htmltools_0.5.2 Matrix_1.4-0
## [25] pkgconfig_2.0.3 biomaRt_2.50.1
## [27] zlibbioc_1.40.0 xtable_1.8-4
## [29] purrr_0.3.4 tzdb_0.2.0
## [31] BiocParallel_1.28.3 tibble_3.1.6
## [33] KEGGREST_1.34.0 generics_0.1.1
## [35] ellipsis_0.3.2 cachem_1.0.6
## [37] SummarizedExperiment_1.24.0 GenomicFeatures_1.46.3
## [39] survival_3.2-13 magrittr_2.0.1
## [41] crayon_1.4.2 memoise_2.0.1
## [43] evaluate_0.14 fansi_0.5.0
## [45] xml2_1.3.3 prettyunits_1.1.1
## [47] tools_4.1.2 hms_1.1.1
## [49] BiocIO_1.4.0 lifecycle_1.0.1
## [51] matrixStats_0.61.0 stringr_1.4.0
## [53] DelayedArray_0.20.0 snpStats_1.44.0
## [55] Biostrings_2.62.0 compiler_4.1.2
## [57] jquerylib_0.1.4 GenomeInfoDb_1.30.0
## [59] rlang_0.4.12 grid_4.1.2
## [61] RCurl_1.98-1.5 rjson_0.2.20
## [63] VariantAnnotation_1.40.0 rappdirs_0.3.3
## [65] bitops_1.0-7 rmarkdown_2.11
## [67] restfulr_0.0.13 DBI_1.1.2
## [69] curl_4.3.2 R6_2.5.1
## [71] GenomicAlignments_1.30.0 rtracklayer_1.54.0
## [73] knitr_1.37 dplyr_1.0.7
## [75] fastmap_1.1.0 bit_4.0.4
## [77] utf8_1.2.2 filelock_1.0.2
## [79] readr_2.1.1 stringi_1.7.6
## [81] parallel_4.1.2 Rcpp_1.0.7
## [83] vctrs_0.3.8 png_0.1-7
## [85] dbplyr_2.1.1 tidyselect_1.1.1
## [87] xfun_0.29