Package: peakPantheR
Authors: Arnaud Wolfer

Package for Peak Picking and ANnoTation of High resolution Experiments in R, implemented in R and Shiny

1 Overview

peakPantheR implements functions to detect, integrate and report pre-defined features in MS files (e.g. compounds, fragments, adducts, …).

It is designed for:

  • Real time feature detection and integration (see Real Time Annotation)
    • process multiple compounds in one file at a time
  • Post-acquisition feature detection, integration and reporting (see Parallel Annotation)
    • process multiple compounds in multiple files in parallel, store results in a single object

peakPantheR can process LC/MS data files in NetCDF, mzML/mzXML and mzData format as data import is achieved using Bioconductor’s mzR package.

2 Installation

To install peakPantheR from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("peakPantheR")

Install the development version of peakPantheR directly from GitHub with:

# Install devtools
if(!require("devtools")) install.packages("devtools")
devtools::install_github("phenomecentre/peakPantheR")

2.1 Getting Started

To get started peakPantheR’s graphical user interface implements all the functions to detect and integrate multiple compounds in multiple files in parallel and store results in a single object. It can be employed to integrate a large number of expected features across a dataset:

library(peakPantheR)

peakPantheR_start_GUI(browser = TRUE)
#  To exit press ESC in the command line

The GUI is to be preferred to understand the methodology, select the best parameters on a subset of the samples before running the command line, or to visually explore results.

If a very high number of samples and features is to be processed, peakpantheR’s command line functions are more efficient, as they can be integrated in scripts and the reporting automated.

3 Input Data

Both real time and parallel compound integration require a common set of information:

  • Path(s) to netCDF / mzML MS file(s)
  • An expected region of interest (RT / m/z window) for each compound.

3.1 MS files

For demonstration purpose we can annotate a set a set of raw MS spectra (in NetCDF format) provided by the faahKO package. Briefly, this subset of the data from (Saghatelian et al. 2004) invesigate the metabolic consequences of knocking out the fatty acid amide hydrolase (FAAH) gene in mice. The dataset consists of samples from the spinal cords of 6 knock-out and 6 wild-type mice. Each file contains data in centroid mode acquired in positive ion mode form 200-600 m/z and 2500-4500 seconds.

Below we install the faahKO package and locate raw CDF files of interest:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("faahKO")
library(faahKO)
## file paths
input_spectraPaths  <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
                        system.file('cdf/KO/ko16.CDF', package = "faahKO"),
                        system.file('cdf/KO/ko18.CDF', package = "faahKO"))
input_spectraPaths
#> [1] "/home/biocbuild/bbs-3.15-bioc/R/library/faahKO/cdf/KO/ko15.CDF"
#> [2] "/home/biocbuild/bbs-3.15-bioc/R/library/faahKO/cdf/KO/ko16.CDF"
#> [3] "/home/biocbuild/bbs-3.15-bioc/R/library/faahKO/cdf/KO/ko18.CDF"

3.2 Expected regions of interest

Expected regions of interest (targeted features) are specified using the following information:

  • cpdID (numeric)
  • cpdName (character)
  • rtMin (sec)
  • rtMax (sec)
  • rt (sec, optional / NA)
  • mzMin (m/z)
  • mzMax (m/z)
  • mz (m/z, optional / NA)

Below we define 2 features of interest that are present in the faahKO dataset and can be employed in subsequent vignettes:

# targetFeatTable
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), 
                        c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin", 
                        "mz", "mzMax"))), stringsAsFactors=FALSE)
input_targetFeatTable[1,] <- c(1, "Cpd 1", 3310., 3344.888, 3390., 522.194778, 
                                522.2, 522.205222)
input_targetFeatTable[2,] <- c(2, "Cpd 2", 3280., 3385.577, 3440., 496.195038,
                                496.2, 496.204962)
input_targetFeatTable[,c(1,3:8)] <- sapply(input_targetFeatTable[,c(1,3:8)], 
                                            as.numeric)
cpdID cpdName rtMin rt rtMax mzMin mz mzMax
1 Cpd 1 3310 3344.888 3390 522.194778 522.2 522.205222
2 Cpd 2 3280 3385.577 3440 496.195038 496.2 496.204962

4 Preparing input for the graphical user interface

While the command line functions accept Data.Frame and vectors as input, the graphical user interface (GUI) will read the same information from a set of .csv files, or an already set-up peakPantheRAnnotation object in .RData format.

We can now generate GUI input files for the faahKO example dataset presented previously:

4.1 peakPantheRAnnotation .RData

A peakPantheRAnnotation (previously annotated or not) can be passed as input in a .RData file. The peakPantheRAnnotation object must be named annotationObject:

library(faahKO)

# Define the file paths (3 samples)
input_spectraPaths  <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
                        system.file('cdf/KO/ko16.CDF', package = "faahKO"),
                        system.file('cdf/KO/ko18.CDF', package = "faahKO"))

# Define the targeted features (2 features)
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(),
                        c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin",
                        "mz", "mzMax"))), stringsAsFactors=FALSE)
input_targetFeatTable[1,] <- c("ID-1", "Cpd 1", 3310., 3344.888, 3390.,
                                522.194778, 522.2, 522.205222)
input_targetFeatTable[2,] <- c("ID-1", "Cpd 2", 3280., 3385.577, 3440.,
                                496.195038, 496.2, 496.204962)
input_targetFeatTable[,3:8] <- sapply(input_targetFeatTable[,3:8], as.numeric)

# Define some random compound and spectra metadata
# cpdMetadata
input_cpdMetadata     <- data.frame(matrix(data=c('a','b',1,2), nrow=2, ncol=2,
                            dimnames=list(c(), c('testcol1','testcol2')),
                            byrow=FALSE), stringsAsFactors=FALSE)
# spectraMetadata
input_spectraMetadata <- data.frame(matrix(data=c('c','d','e',3,4,5), nrow=3,
                            ncol=2,
                            dimnames=list(c(),c('testcol1','testcol2')),
                            byrow=FALSE), stringsAsFactors=FALSE)

# Initialise a simple peakPantheRAnnotation object
# [3 files, 2 features, no uROI, no FIR]
initAnnotation <- peakPantheRAnnotation(spectraPaths=input_spectraPaths,
                                        targetFeatTable=input_targetFeatTable,
                                        cpdMetadata=input_cpdMetadata,
                                        spectraMetadata=input_spectraMetadata)

# Rename and save the annotation to disk
annotationObject <- initAnnotation
save(annotationObject,
        file = './example_annotation_ppR_UI.RData',
        compress=TRUE)

4.2 CSV file input

Another input option for the GUI input consists of a set of .csv files.

4.2.1 Targeted features

Targeted features are defined in a .csv with as rows each feature to target (the first row must be the column name), and as columns the fit parameters to use. At minimum the following parameters must be defined:

cpdID, cpdName, rtMin, rt, rtMax, mzMin, mz, mzMax

If uROI and FIR are to be set, the following columns must be provided:

cpdID, cpdName, ROI_rt, ROI_mz, ROI_rtMin, ROI_rtMax, ROI_mzMin, ROI_mzMax, uROI_rtMin, uROI_rtMax, uROI_mzMin, uROI_mzMax, uROI_rt, uROI_mz, FIR_rtMin, FIR_rtMax, FIR_mzMin, FIR_mzMax

# Define targeted features without uROI and FIR (2 features)
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(),
                        c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin",
                        "mz", "mzMax"))), stringsAsFactors=FALSE)
input_targetFeatTable[1,] <- c("ID-1", "Cpd 1", 3310., 3344.888, 3390.,
                                522.194778, 522.2, 522.205222)
input_targetFeatTable[2,] <- c("ID-1", "Cpd 2", 3280., 3385.577, 3440.,
                                496.195038, 496.2, 496.204962)
input_targetFeatTable[,3:8] <- sapply(input_targetFeatTable[,3:8], as.numeric)

# save to disk
write.csv(input_targetFeatTable,
            file = './1-fitParams_example_UI.csv',
            row.names = FALSE)
cpdID cpdName rtMin rt rtMax mzMin mz mzMax
ID-1 Cpd 1 3310 3344.888 3390 522.194778 522.2 522.205222
ID-1 Cpd 2 3280 3385.577 3440 496.195038 496.2 496.204962

4.2.2 Files to process and spectra metadata (optional)

It is possible to select the files on disk directly through the GUI, or to select a .csv file containing each file path as well as spectra metadata. Each row correspond to a different spectra (the first row must define the column names) while columns correspond to the path on disk and metadata. At minimum a column filepath must be present, with subsequent columns defining metadata properties.

# Define the spectra paths and metada
input_spectraMeta <- data.frame(matrix(vector(), 3, 3,
                        dimnames=list(c(),c("filepath","testcol1","testcol2"))),
                        stringsAsFactors=FALSE)
input_spectraMeta[1,] <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
                            "c", 3)
input_spectraMeta[2,] <- c(system.file('cdf/KO/ko16.CDF', package = "faahKO"),
                            "d", 4)
input_spectraMeta[3,] <- c(system.file('cdf/KO/ko18.CDF', package = "faahKO"),
                            "e", 5)

# save to disk
write.csv(input_spectraMeta,
            file = './2-spectraMetaWPath_example_UI.csv',
            row.names = FALSE)
Table continues below
filepath testcol1
/home/biocbuild/bbs-3.15-bioc/R/library/faahKO/cdf/KO/ko15.CDF c
/home/biocbuild/bbs-3.15-bioc/R/library/faahKO/cdf/KO/ko16.CDF d
/home/biocbuild/bbs-3.15-bioc/R/library/faahKO/cdf/KO/ko18.CDF e
testcol2
3
4
5

4.2.3 Feature meatadata (optional)

It is possible to define some feature metadata, with targeted features as rows (same order as the fitting parameters, first row defining the column names), and as columns the metadata.

# Define the feature metada
input_featMeta <- data.frame(matrix(vector(), 2, 2,
                    dimnames=list(c(),c("testcol1","testcol2"))),
                    stringsAsFactors=FALSE)
input_featMeta[1,] <- c("a", 1)
input_featMeta[2,] <- c("b", 2)

# save to disk
write.csv(input_featMeta,
            file = './3-featMeta_example_UI.csv',
            row.names = FALSE)
testcol1 testcol2
a 1
b 2

6 Session Information

#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 RC (2022-04-19 r82224)
#>  os       Ubuntu 20.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2022-04-26
#>  pandoc   2.5 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package              * version  date (UTC) lib source
#>  affy                   1.74.0   2022-04-26 [2] Bioconductor
#>  affyio                 1.66.0   2022-04-26 [2] Bioconductor
#>  assertthat             0.2.1    2019-03-21 [2] CRAN (R 4.2.0)
#>  Biobase              * 2.56.0   2022-04-26 [2] Bioconductor
#>  BiocGenerics         * 0.42.0   2022-04-26 [2] Bioconductor
#>  BiocManager            1.30.17  2022-04-22 [2] CRAN (R 4.2.0)
#>  BiocParallel         * 1.30.0   2022-04-26 [2] Bioconductor
#>  BiocStyle            * 2.24.0   2022-04-26 [2] Bioconductor
#>  bitops                 1.0-7    2021-04-24 [2] CRAN (R 4.2.0)
#>  bookdown               0.26     2022-04-15 [2] CRAN (R 4.2.0)
#>  brio                   1.1.3    2021-11-30 [2] CRAN (R 4.2.0)
#>  bslib                  0.3.1    2021-10-06 [2] CRAN (R 4.2.0)
#>  cachem                 1.0.6    2021-08-19 [2] CRAN (R 4.2.0)
#>  callr                  3.7.0    2021-04-20 [2] CRAN (R 4.2.0)
#>  cli                    3.3.0    2022-04-25 [2] CRAN (R 4.2.0)
#>  clue                   0.3-60   2021-10-11 [2] CRAN (R 4.2.0)
#>  cluster                2.1.3    2022-03-28 [2] CRAN (R 4.2.0)
#>  codetools              0.2-18   2020-11-04 [2] CRAN (R 4.2.0)
#>  colorspace             2.0-3    2022-02-21 [2] CRAN (R 4.2.0)
#>  crayon                 1.5.1    2022-03-26 [2] CRAN (R 4.2.0)
#>  DBI                    1.1.2    2021-12-20 [2] CRAN (R 4.2.0)
#>  DelayedArray           0.22.0   2022-04-26 [2] Bioconductor
#>  DEoptimR               1.0-11   2022-04-03 [2] CRAN (R 4.2.0)
#>  desc                   1.4.1    2022-03-06 [2] CRAN (R 4.2.0)
#>  devtools               2.4.3    2021-11-30 [2] CRAN (R 4.2.0)
#>  digest                 0.6.29   2021-12-01 [2] CRAN (R 4.2.0)
#>  doParallel             1.0.17   2022-02-07 [2] CRAN (R 4.2.0)
#>  dplyr                  1.0.8    2022-02-08 [2] CRAN (R 4.2.0)
#>  DT                     0.22     2022-03-28 [2] CRAN (R 4.2.0)
#>  ellipsis               0.3.2    2021-04-29 [2] CRAN (R 4.2.0)
#>  evaluate               0.15     2022-02-18 [2] CRAN (R 4.2.0)
#>  faahKO               * 1.35.0   2022-04-26 [2] Bioconductor
#>  fansi                  1.0.3    2022-03-24 [2] CRAN (R 4.2.0)
#>  fastmap                1.1.0    2021-01-25 [2] CRAN (R 4.2.0)
#>  foreach                1.5.2    2022-02-02 [2] CRAN (R 4.2.0)
#>  fs                     1.5.2    2021-12-08 [2] CRAN (R 4.2.0)
#>  generics               0.1.2    2022-01-31 [2] CRAN (R 4.2.0)
#>  GenomeInfoDb           1.32.0   2022-04-26 [2] Bioconductor
#>  GenomeInfoDbData       1.2.8    2022-04-21 [2] Bioconductor
#>  GenomicRanges          1.48.0   2022-04-26 [2] Bioconductor
#>  ggplot2                3.3.5    2021-06-25 [2] CRAN (R 4.2.0)
#>  glue                   1.6.2    2022-02-24 [2] CRAN (R 4.2.0)
#>  gtable                 0.3.0    2019-03-25 [2] CRAN (R 4.2.0)
#>  highr                  0.9      2021-04-16 [2] CRAN (R 4.2.0)
#>  htmltools              0.5.2    2021-08-25 [2] CRAN (R 4.2.0)
#>  htmlwidgets            1.5.4    2021-09-08 [2] CRAN (R 4.2.0)
#>  httpuv                 1.6.5    2022-01-05 [2] CRAN (R 4.2.0)
#>  impute                 1.70.0   2022-04-26 [2] Bioconductor
#>  IRanges                2.30.0   2022-04-26 [2] Bioconductor
#>  iterators              1.0.14   2022-02-05 [2] CRAN (R 4.2.0)
#>  jquerylib              0.1.4    2021-04-26 [2] CRAN (R 4.2.0)
#>  jsonlite               1.8.0    2022-02-22 [2] CRAN (R 4.2.0)
#>  knitr                  1.38     2022-03-25 [2] CRAN (R 4.2.0)
#>  later                  1.3.0    2021-08-18 [2] CRAN (R 4.2.0)
#>  lattice                0.20-45  2021-09-22 [2] CRAN (R 4.2.0)
#>  lifecycle              1.0.1    2021-09-24 [2] CRAN (R 4.2.0)
#>  limma                  3.52.0   2022-04-26 [2] Bioconductor
#>  magrittr               2.0.3    2022-03-30 [2] CRAN (R 4.2.0)
#>  MALDIquant             1.21     2021-12-23 [2] CRAN (R 4.2.0)
#>  MASS                   7.3-57   2022-04-22 [2] CRAN (R 4.2.0)
#>  MassSpecWavelet        1.62.0   2022-04-26 [2] Bioconductor
#>  Matrix                 1.4-1    2022-03-23 [2] CRAN (R 4.2.0)
#>  MatrixGenerics         1.8.0    2022-04-26 [2] Bioconductor
#>  matrixStats            0.62.0   2022-04-19 [2] CRAN (R 4.2.0)
#>  memoise                2.0.1    2021-11-26 [2] CRAN (R 4.2.0)
#>  mime                   0.12     2021-09-28 [2] CRAN (R 4.2.0)
#>  MsCoreUtils            1.8.0    2022-04-26 [2] Bioconductor
#>  MsFeatures             1.4.0    2022-04-26 [2] Bioconductor
#>  MSnbase              * 2.22.0   2022-04-26 [2] Bioconductor
#>  munsell                0.5.0    2018-06-12 [2] CRAN (R 4.2.0)
#>  mzID                   1.34.0   2022-04-26 [2] Bioconductor
#>  mzR                  * 2.30.0   2022-04-26 [2] Bioconductor
#>  ncdf4                  1.19     2021-12-15 [2] CRAN (R 4.2.0)
#>  pander               * 0.6.5    2022-03-18 [2] CRAN (R 4.2.0)
#>  pcaMethods             1.88.0   2022-04-26 [2] Bioconductor
#>  peakPantheR          * 1.10.0   2022-04-26 [1] Bioconductor
#>  pillar                 1.7.0    2022-02-01 [2] CRAN (R 4.2.0)
#>  pkgbuild               1.3.1    2021-12-20 [2] CRAN (R 4.2.0)
#>  pkgconfig              2.0.3    2019-09-22 [2] CRAN (R 4.2.0)
#>  pkgload                1.2.4    2021-11-30 [2] CRAN (R 4.2.0)
#>  plyr                   1.8.7    2022-03-24 [2] CRAN (R 4.2.0)
#>  preprocessCore         1.58.0   2022-04-26 [2] Bioconductor
#>  prettyunits            1.1.1    2020-01-24 [2] CRAN (R 4.2.0)
#>  processx               3.5.3    2022-03-25 [2] CRAN (R 4.2.0)
#>  promises               1.2.0.1  2021-02-11 [2] CRAN (R 4.2.0)
#>  ProtGenerics         * 1.28.0   2022-04-26 [2] Bioconductor
#>  ps                     1.7.0    2022-04-23 [2] CRAN (R 4.2.0)
#>  purrr                  0.3.4    2020-04-17 [2] CRAN (R 4.2.0)
#>  R6                     2.5.1    2021-08-19 [2] CRAN (R 4.2.0)
#>  RANN                   2.6.1    2019-01-08 [2] CRAN (R 4.2.0)
#>  RColorBrewer           1.1-3    2022-04-03 [2] CRAN (R 4.2.0)
#>  Rcpp                 * 1.0.8.3  2022-03-17 [2] CRAN (R 4.2.0)
#>  RCurl                  1.98-1.6 2022-02-08 [2] CRAN (R 4.2.0)
#>  remotes                2.4.2    2021-11-30 [2] CRAN (R 4.2.0)
#>  rlang                  1.0.2    2022-03-04 [2] CRAN (R 4.2.0)
#>  rmarkdown              2.14     2022-04-25 [2] CRAN (R 4.2.0)
#>  robustbase             0.95-0   2022-04-02 [2] CRAN (R 4.2.0)
#>  rprojroot              2.0.3    2022-04-02 [2] CRAN (R 4.2.0)
#>  S4Vectors            * 0.34.0   2022-04-26 [2] Bioconductor
#>  sass                   0.4.1    2022-03-23 [2] CRAN (R 4.2.0)
#>  scales                 1.2.0    2022-04-13 [2] CRAN (R 4.2.0)
#>  sessioninfo            1.2.2    2021-12-06 [2] CRAN (R 4.2.0)
#>  shiny                  1.7.1    2021-10-02 [2] CRAN (R 4.2.0)
#>  shinycssloaders        1.0.0    2020-07-28 [2] CRAN (R 4.2.0)
#>  stringi                1.7.6    2021-11-29 [2] CRAN (R 4.2.0)
#>  stringr                1.4.0    2019-02-10 [2] CRAN (R 4.2.0)
#>  SummarizedExperiment   1.26.0   2022-04-26 [2] Bioconductor
#>  testthat               3.1.4    2022-04-26 [2] CRAN (R 4.2.0)
#>  tibble                 3.1.6    2021-11-07 [2] CRAN (R 4.2.0)
#>  tidyselect             1.1.2    2022-02-21 [2] CRAN (R 4.2.0)
#>  usethis                2.1.5    2021-12-09 [2] CRAN (R 4.2.0)
#>  utf8                   1.2.2    2021-07-24 [2] CRAN (R 4.2.0)
#>  vctrs                  0.4.1    2022-04-13 [2] CRAN (R 4.2.0)
#>  vsn                    3.64.0   2022-04-26 [2] Bioconductor
#>  withr                  2.5.0    2022-03-03 [2] CRAN (R 4.2.0)
#>  xcms                 * 3.18.0   2022-04-26 [2] Bioconductor
#>  xfun                   0.30     2022-03-02 [2] CRAN (R 4.2.0)
#>  XML                    3.99-0.9 2022-02-24 [2] CRAN (R 4.2.0)
#>  xtable                 1.8-4    2019-04-21 [2] CRAN (R 4.2.0)
#>  XVector                0.36.0   2022-04-26 [2] Bioconductor
#>  yaml                   2.3.5    2022-02-21 [2] CRAN (R 4.2.0)
#>  zlibbioc               1.42.0   2022-04-26 [2] Bioconductor
#> 
#>  [1] /tmp/Rtmpez6P5Z/Rinsta70984e93d0c7
#>  [2] /home/biocbuild/bbs-3.15-bioc/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

References

Saghatelian, A., S. A. Trauger, E. J. Want, E. G. Hawkins, G. Siuzdak, and B. F. Cravatt. 2004. “Assignment of Endogenous Substrates to Enzymes by Global Metabolite Profiling.” Biochemistry 43: 14332–9. http://dx.doi.org/10.1021/bi0480335.