MSstats: Protein/Peptide significance analysis

Package: MSstats

Author: Meena Choi mnchoi67@gmail.com, Tsung-Heng Tsai tsai.tsungheng@gmail.com, Cyril Galitzine cyrildgg@gmail.com

Date: October 31, 2019

This vignette summarizes the introduction and various options of all functionalities in MSstats. More details are available in User Manual.

SkylinetoMSstatsFormat

Preprocess MSstats input report from Skyline and convert into the required input format for MSstats.

Arguments

Example

# 'MSstatsInput.csv' is the MSstats report from Skyline.
input <- read.csv(file="MSstatsInput.csv")

raw <- SkylinetoMSstatsFormat(input)

MaxQtoMSstatsFormat

Convert MaxQuant output into the required input format for MSstats.

Arguments

Example

# Read in MaxQuant files
proteinGroups <- read.table("proteinGroups.txt", sep="\t", header=TRUE)

infile <- read.table("evidence.txt", sep="\t", header=TRUE)

# Read in annotation including condition and biological replicates per run.
# Users should make this annotation file. It is not the output from MaxQuant.
annot <- read.csv("annotation.csv", header=TRUE)

raw <- MaxQtoMSstatsFormat(evidence=infile, 
                           annotation=annot, 
                           proteinGroups=proteinGroups)

ProgenesistoMSstatsFormat

Convert Progenesis output into the required input format for MSstats.

Arguments

Example

input <- read.csv("output_progenesis.csv", stringsAsFactors=FALSE) 

# Read in annotation including condition and biological replicates per run.
# Users should make this annotation file. It is not the output from Progenesis.
annot <- read.csv('annotation.csv')

raw <- ProgenesistoMSstatsFormat(input, annotation=annot)

SpectronauttoMSstatsFormat

Convert Spectronaut output into the required input format for MSstats.

Arguments

Example

input <- read.csv("output_spectronaut.csv", stringsAsFactors=FALSE) 

quant <- SpectronauttoMSstatsFormat(input)

dataProcess

Data pre-processing and quality control of MS runs of the original raw data into quantitative data for model fitting and group comparison. Log transformation is automatically applied and additional variables are created in columns for model fitting and group comparison process. Three options of data pre-processing and quality control of MS runs in dataProcess are

Arguments

Details of outputs

RunlevelData from dataProcess

ComparisonResult from groupComparison : one or two columns will be added.

Example

QuantData <- dataProcess(SRMRawData)

dataProcessPlots

Visualization for explanatory data analysis. To illustrate the quantitative data after data-preprocessing and quality control of MS runs, dataProcessPlots takes the quantitative data from function dataProcess as input and automatically generate three types of figures in pdf files as output :

Arguments

Example

# QuantData <- dataProcess(SRMRawData)
# 
# # Profile plot
# dataProcessPlots(data=QuantData, type="ProfilePlot")
# 
# # Quality control plot 
# dataProcessPlots(data=QuantData, type="QCPlot")   
# 
# # Quantification plot for conditions
# dataProcessPlots(data=QuantData, type="ConditionPlot")

groupComparison

Tests for significant changes in protein abundance across conditions based on a family of linear mixed-effects models in targeted Selected Reaction Monitoring (SRM), Data-Dependent Acquisition (DDA or shotgun), and Data-Independent Acquisition (DIA or SWATH-MS) experiment. It is applicable to multiple types of sample preparation, including label-free workflows, workflows that use stable isotope labeled reference proteins and peptides, and workflows that use fractionation. Experimental design of case-control study (patients are not repeatedly measured) or time course study (patients are repeatedly measured) is automatically determined based on proper statistical model.

Arguments

Example

# QuantData <- dataProcess(SRMRawData)
# 
# levels(QuantData$ProcessedData$GROUP_ORIGINAL)
# comparison <- matrix(c(-1,0,0,0,0,0,1,0,0,0), nrow=1)
# row.names(comparison) <- "T7-T1"
# 
# # Tests for differentially abundant proteins with models:
# testResultOneComparison <- groupComparison(contrast.matrix=comparison, data=QuantData)

groupComparisonPlots

Visualization for model-based analysis and summarizing differentially abundant proteins. To summarize the results of log-fold changes and adjusted p-values for differentially abundant proteins, groupComparisonPlots takes testing results from function groupComparison as input and automatically generate three types of figures in pdf files as output :

Arguments

Example

# QuantData <- dataProcess(SRMRawData)
# 
# # based on multiple comparisons  (T1 vs T3; T1 vs T7; T1 vs T9)
# comparison1<-matrix(c(-1,0,1,0,0,0,0,0,0,0),nrow=1)
# comparison2<-matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1)
# comparison3<-matrix(c(-1,0,0,0,0,0,0,0,1,0),nrow=1)
# comparison<-rbind(comparison1,comparison2, comparison3)
# row.names(comparison)<-c("T3-T1","T7-T1","T9-T1")
# 
# testResultMultiComparisons <- groupComparison(contrast.matrix=comparison, data=QuantData)
# 
# # Volcano plot 
# groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="VolcanoPlot")
# 
# # Heatmap 
# groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="Heatmap")
# 
# # Comparison Plot
# groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="ComparisonPlot")

modelBasedQCPlots

Results based on statistical models for whole plot level inference are accurate as long as the assumptions of the model are met. The model assumes that the measurement errors are normally distributed with mean 0 and constant variance. The assumption of a constant variance can be checked by examining the residuals from the model.

To check the assumption of linear model for whole plot inference, modelBasedQCPlots takes the results after fitting models from function groupComparison as input and automatically generate two types of figures in pdf files as output.

Arguments

Example

# testResultOneComparison <- groupComparison(contrast.matrix=comparison, data=QuantData)
# 
# # normal quantile-quantile plots
# modelBasedQCPlots(data=testResultOneComparison, type="QQPlots")
# 
# # residual plots
# modelBasedQCPlots(data=testResultOneComparison, type="ResidualPlots")

designSampleSize

Calculate sample size for future experiments of a Selected Reaction Monitoring (SRM), Data-Dependent Acquisition (DDA or shotgun), and Data-Independent Acquisition (DIA or SWATH-MS) experiment based on intensity-based linear model. The function fits the model and uses variance components to calculate sample size. The underlying model fitting with intensity-based linear model with technical MS run replication. Estimated sample size is rounded to 0 decimal. Two options of the calculation:

Arguments

Example

# QuantData <- dataProcess(SRMRawData)
# head(QuantData$ProcessedData)
# 
# ## based on multiple comparisons  (T1 vs T3; T1 vs T7; T1 vs T9)
# comparison1 <- matrix(c(-1,0,1,0,0,0,0,0,0,0),nrow=1)
# comparison2 <- matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1)
# comparison3 <- matrix(c(-1,0,0,0,0,0,0,0,1,0),nrow=1)
# comparison <- rbind(comparison1,comparison2, comparison3)
# row.names(comparison) <- c("T3-T1","T7-T1","T9-T1")
# 
# testResultMultiComparisons <- groupComparison(contrast.matrix=comparison,data=QuantData)
# 
# #(1) Minimal number of biological replicates per condition
# designSampleSize(data=testResultMultiComparisons$fittedmodel, numSample=TRUE,
#   desiredFC=c(1.25,1.75), FDR=0.05, power=0.8)
# 
# #(2) Power calculation
# designSampleSize(data=testResultMultiComparisons$fittedmodel, numSample=2,
#   desiredFC=c(1.25,1.75), FDR=0.05, power=TRUE)

designSampleSizePlots

To illustrate the relationship of desired fold change and the calculated minimal number sample size which are

The input is the result from function designSampleSize.

Arguments

Example

# # (1) Minimal number of biological replicates per condition
# result.sample <- designSampleSize(data=testResultMultiComparisons$fittedmodel, numSample=TRUE,
#                                 desiredFC=c(1.25,1.75), FDR=0.05, power=0.8)
# designSampleSizePlots(data=result.sample)
# 
# # (2) Power
# result.power <- designSampleSize(data=testResultMultiComparisons$fittedmodel, numSample=2,
#                                desiredFC=c(1.25,1.75), FDR=0.05, power=TRUE)
# designSampleSizePlots(data=result.power)

quantification

Model-based quantification for each condition or for each biological samples per protein in a targeted Selected Reaction Monitoring (SRM), Data-Dependent Acquisition (DDA or shotgun), and Data-Independent Acquisition (DIA or SWATH-MS) experiment. Quantification takes the processed data set by dataProcess as input and automatically generate the quantification results (data.frame) with long or matrix format. The quantification for endogenous samples is based on run summarization from subplot model, with TMP robust estimation.

Arguments

Example

# QuantData <- dataProcess(SRMRawData)
# 
# # Sample quantification
# sampleQuant <- quantification(QuantData)
# 
# # Group quantification
# groupQuant <- quantification(QuantData, type="Group")