1 Introduction
2 Exploratory analysis
- 2.1 Principal components analysis (PCA)
- 2.2 Feature colocalization
3 Image segmentation
- 3.1 Spatial shrunken centroids clustering
- 3.2 Spatial Dirichlet Gaussian mixture modeling
4 Classification and cross-validation
- 4.1 Projection to latent structures (PLS)
- 4.2 Spatial shrunken centroids classification
5 Class comparison
- 5.1 Group means-based testing
- 5.2 Segmentation-based testing
6 Session information

1 Introduction

Cardinal 2 provides statistical methods for both supervised and unsupervised analysis of mass spectrometry (MS) imaging experiments. Class comparison can also be performed, provided an appropriate experimental design and sample size.

Before statistical analysis, it is important to identify the statistical goal of the experiment:

Unsupervised analysis. The data has no class labels or conditions, and we are interested in exploratory analysis to discover regions of interest in the data.
Supervised analysis. The data has class labels and we want to train a statistical or machine learning model to predict the class labels of new data.
Class comparison. The data has class labels or conditions, and we want to test whether the abundance of the mass features is different between conditions.

CardinalWorkflows provides real experimental data and more detailed discussion of the statistical methods than will be covered in this brief overview.

2 Exploratory analysis

Suppose we are exploring an unlabeled dataset, and wish to understand the structure of the data.

set.seed(2020)
mse <- simulateImage(preset=2, npeaks=10, dim=c(20,20), sdnoise=0.5,
                    peakheight=c(2,4), representation="centroid")

design <- makeFactor(circle=mse$circle, square=mse$square,
                        bg=!(mse$circle | mse$square))

image(mse, design ~ x * y, key=TRUE)

image(mse, feature=c(1,4,7), layout=c(1,3))

2.1 Principal components analysis (PCA)

Principal components analysis is an unsupervised dimension reduction technique. It reduces the data to some number of “principal components” that are a linear combination of the original mass features, where each component is orthogonal to the last, and explains as much of the variance in the data as possible.

Use PCA() to perform PCA on a MSImagingExperiment.

pca <- PCA(mse, ncomp=3)

summary(pca)

## Principal components analysis:
##  
##   Component Standard deviation
## 1         1          4.7061494
## 2         2          2.6134145
## 3         3          0.6136734

We can see that the first two principal components explain most of the variation in the data.

image(pca, values="scores", superpose=FALSE, layout=c(1,3))

The loadings of the components show how each mass feature contributes to each component.

plot(pca, values="loadings", superpose=FALSE, layout=c(1,3), lwd=2)

Plotting the principal component scores against each other is a useful way of visualization the separation between data classes.

pca_scores <- DataFrame(resultData(pca, 1, "scores"))

plot(pca_scores, PC1 ~ PC2, groups=design, pch=20)

2.2 Feature colocalization

Finding other mass features colocalized with a particular image is a common task in analysis of MS imaging experiments.

Use colocalize() to find mass features that are colocalized with another image.

coloc <- colocalized(mse, mz=1023)
coloc

## Colocalized features: 
##           mz   circle   square correlation    M1    M2
## 1  1023.7081 2.011661 4.063644   1.0000000 1.000 1.000
## 2  1135.9335 2.434873 3.985370   0.9430259 0.875 0.875
## 3  1200.4653 2.219637 4.166854   0.9292093 0.865 0.865
## 4  1361.2682 0.000000 4.259568   0.6712111 0.710 0.710
## 5  1227.9380 0.000000 4.039750   0.6671688 0.675 0.675
## 6  1453.5096 0.000000 4.187344   0.6657311 0.695 0.695
## 7  1858.8985 0.000000 3.970513   0.6620943 0.705 0.705
## 8   781.2367 1.392247 0.000000   0.3891237 0.650 0.650
## 9   473.9206 2.340799 0.000000   0.3632409 0.600 0.600
## 10  788.8633 1.542205 0.000000   0.3378016 0.605 0.605

By default, Pearson correlation is used to rank the colocalized features. Manders’ colocalization coefficients (M1 and M2) are also provided.

image(mse, mz=coloc$mz[1:3], layout=c(1,3))

3 Image segmentation

Segmentation (clustering) a dataset is a useful way to summarize an MS imaging experiment and discover regions of interest within the sample.

3.1 Spatial shrunken centroids clustering

Spatially-aware nearest shrunken centroids clustering allows simultaneous image segmentation and feature selection.

A smoothing radius r, initial number of clusters k, and sparsity parameters s must be provided.

The larger the sparsity parameter s, the fewer mass features will contribute to the segmentation.

Spatial shrunken centroids may result in fewer clusters than the initial number of clusters k, so it is recommended to use a value for k that is larger than the expected number of clusters, and allow the method to automatically choose the number of clusters.

ssc <- spatialShrunkenCentroids(mse, r=1, k=5, s=c(0,3,6,9))

summary(ssc)

## Spatially-aware nearest shrunken centroids:
##  
##  Segmentation / clustering 
##  Method = gaussian 
##  Distance = chebyshev
##  
##   Radius (r) Init (k) Shrinkage (s) Classes Features/Class
## 1          1        5             0       4          10.00
## 2          1        5             3       3          10.00
## 3          1        5             6       3           8.67
## 4          1        5             9       3           7.33

Plotting the predicted cluster probabilities shows a clear segmentation into the ground truth image.

image(ssc, model=list(s=9), values="probability")

Spatial shrunken centroids calculates t-statistics for each segment and each mass feature. These t-statistics a measure of the difference between the cluster center and the global mean.

plot(ssc, model=list(s=9), values="statistic", lwd=2)

Mass features with t-statistics of zero do not contribute to the segmentation. The sign of the t-statistic indicates whether the mass feature is over- or under-expressed in the given cluster relative to the global mean.

Use topFeatures() to rank mass features by t-statistic.

ssc_top <- topFeatures(ssc, model=list(s=9), class == 1)
ssc_top

## Top-ranked features: 
##           mz   circle   square r k s class   centers  statistic
## 1   473.9206 2.340799 0.000000 1 5 9     1 2.3475172 11.9165555
## 2  1135.9335 2.434873 3.985370 1 5 9     1 4.3294593  4.0936001
## 3   788.8633 1.542205 0.000000 1 5 9     1 0.7122964  0.9371928
## 4   781.2367 1.392247 0.000000 1 5 9     1 0.6067728  0.3683360
## 5  1023.7081 2.011661 4.063644 1 5 9     1 2.8075020  0.0000000
## 6  1200.4653 2.219637 4.166854 1 5 9     1 2.4490113  0.0000000
## 7  1858.8985 0.000000 3.970513 1 5 9     1 1.2245057  0.0000000
## 8  1361.2682 0.000000 4.259568 1 5 9     1 1.3563276 -0.3583295
## 9  1453.5096 0.000000 4.187344 1 5 9     1 1.3431417 -1.3316029
## 10 1227.9380 0.000000 4.039750 1 5 9     1 1.2913549 -2.2072382

3.2 Spatial Dirichlet Gaussian mixture modeling

Spatially-aware Dirichlet Gaussian mixture models (spatial-DGMM) is a method of image segmentation applied to each mass feature individually, rather than the dataset as a whole.

This is useful for summarizing molecular ion images, and for discovering structures that clustering using all mass features together may miss.

dgmm <- spatialDGMM(mse, r=1, k=5, method="adaptive")

summary(dgmm)

## Spatially-aware Dirichlet Gaussian mixture models:
##  
##  Segmentation on 1 group: run0 
##  Method = adaptive 
##  Distance = chebyshev
##  
##    Radius (r) Init (k) Feature Classes/Group
## 1           1        5       1             2
## 2           1        5       2             4
## 3           1        5       3             1
## 4           1        5       4             3
## 5           1        5       5             3
## 6           1        5       6             4
## 7           1        5       7             2
## 8           1        5       8             2
## 9           1        5       9             2
## 10          1        5      10             2

A different segmentation is fit for each mass feature.

image(dgmm, model=list(feature=c(1,4,7)), layout=c(1,3))

Each image is modeled as a mixture of Gaussian distributions.

plot(dgmm, model=list(feature=c(1,4,7)), layout=c(1,3))

Spatial-DGMM segmentations can be especially useful for finding mass features colocalized with a region-of-interest.

When applied to a SpatialDGMM object, colocalize() is able to use match scores that can have a higher specificity than using Pearson correlation on the raw ion images.

coloc2 <- colocalized(dgmm, mse$square)
subset(coloc2, select=c(-r, -k, -group))

## Colocalized features: 
##           mz   circle   square feature class    Mscore        M1        M2
## 1  1227.9380 0.000000 4.039750       7     2 0.9811321 1.0000000 0.9811321
## 2  1453.5096 0.000000 4.187344       9     2 0.9503106 0.9807692 0.9683544
## 3  1361.2682 0.000000 4.259568       8     2 0.9430380 0.9551282 0.9867550
## 4  1858.8985 0.000000 3.970513      10     2 0.9113924 0.9230769 0.9863014
## 5  1023.7081 2.011661 4.063644       4     3 0.5000000 0.6025641 0.7460317
## 6   473.9206 2.340799 0.000000       1     1 0.4834437 0.9358974 0.5000000
## 7  1200.4653 2.219637 4.166854       6     4 0.4606742 0.5256410 0.7884615
## 8  1135.9335 2.434873 3.985370       5     2 0.3952381 0.5320513 0.6058394
## 9   788.8633 1.542205 0.000000       3     1 0.3900000 1.0000000 0.3900000
## 10  781.2367 1.392247 0.000000       2     1 0.3865248 0.6987179 0.4638298

4 Classification and cross-validation

Classification of pixels into different known classes (e.g., cancer vs normal) based on the mass spectra is a common application for MS imaging.

set.seed(2020)
mse2 <- simulateImage(preset=7, npeaks=10, dim=c(10,10), sdnoise=0.5,
                    nruns=3, peakdiff=2, representation="centroid")

class <- makeFactor(A=mse2$circleA, B=mse2$circleB)

image(mse2, class ~ x * y, key=TRUE, layout=c(1,3))

image(mse2, feature=1, layout=c(1,3))

When performing classification, it is important to use cross-validation so that reported accuracies are not overly optimistic.

We strongly recomend making sure that all spectra from the same experiment run belong to the same fold, to reduce predictive bias due to run effects.

4.1 Projection to latent structures (PLS)

Projection to latent structures (PLS), also called partial least squares, is a supervised dimension reduction technique. It can be thought of as being similar to PCA, but for classification or regression.

cv_pls <- crossValidate(mse2, .y=class, .fun=PLS, ncomp=1:5, .fold=run(mse2))

summary(cv_pls)

## Cross validation:
##  
##  Classification on 2 classes: A B 
##  Summarized 3 folds: run0 run1 run2
##  
##   ncomp  Accuracy Sensitivity Specificity
## 1     1 0.6608485   0.0000000   1.0000000
## 2     2 0.8100534   0.4811594   0.9690476
## 3     3 0.9200094   0.8405797   0.9776557
## 4     4 0.9132067   0.8550725   0.9553114
## 5     5 0.9088528   0.7884058   0.9648352

We can see that using 3 PLS components produces the best cross-validated accuracy.

pls <- PLS(mse2, y=class, ncomp=3)

summary(pls)

## Projection to latent components:
##  
##  Classification on 2 classes: A B 
##  Method = pls
##  
##   Number of Components Accuracy Sensitivity Specificity
## 1                    3      0.9   0.7647059   0.9775281

We can plot the fitted values to visualize the prediction.

image(pls, values="fitted", layout=c(1,3))

The PLS regression coefficients can be used to select influential features.

plot(pls, values="coefficients", lwd=2)

Like PCA, it can be useful to plot the PLS scores against each other to visualize the separation between classes.

pls_scores <- DataFrame(resultData(pls, 1, "scores"))

plot(pls_scores, C1 ~ C2, groups=class, pch=20)

Note that orthgonal PLS (O-PLS) is also available via method="opls" or by using the separate OPLS() method. Typically, both methods perform similarly, although O-PLS can sometimes produce more easily interpretable regression coefficients.

4.2 Spatial shrunken centroids classification

Spatially-aware nearest shrunken centroids classification is an extension of nearest shrunken centroids that incorporates spatial information into the model.

Like in the clustering case of spatial shrunken centroids, a smoothing radius r must be provided along with sparsity parameters s.

cv_ssc <- crossValidate(mse2, .y=class,
                        .fun=spatialShrunkenCentroids,
                        r=1, s=c(0,3,6,9), .fold=run(mse2))

summary(cv_ssc)

## Cross validation:
##  
##  Classification on 2 classes: A B 
##  Summarized 3 folds: run0 run1 run2
##  
##   r s  Accuracy Sensitivity Specificity
## 1 1 0 0.9661925   1.0000000   0.9505495
## 2 1 3 0.9429979   0.8840580   0.9871795
## 3 1 6 0.8593713   0.6666667   1.0000000
## 4 1 9 0.7955665   0.5217391   1.0000000

We can see that in this case, the fully dense model (s=0) that uses all mass features has the best cross-validated accuracy for the data.

ssc2 <- spatialShrunkenCentroids(mse2, y=class, r=1, s=0)

summary(ssc2)

## Spatially-aware nearest shrunken centroids:
##  
##  Classification on 2 classes: A B 
##  Method = gaussian 
##  Distance = chebyshev
##  
##   Radius (r) Shrinkage (s) Features/Class  Accuracy Sensitivity Specificity
## 1          1             0             10 0.9714286           1   0.9550562

Plotting the predicted class probabilities produces a more easily interpretable visualization than PLS in this case.

image(ssc2, layout=c(1,3))

Plotting t-statistics shows the first three features have much higher abundance in condition “B”.

plot(ssc2, values="statistic", lwd=2)

topFeatures(ssc2, class=="B") %>% subset(select=c(-diff, -r, -k, -s))

## Top-ranked features: 
##           mz       ref   circleA   circleB class   centers statistic
## 1   788.8633 0.0000000 0.8374991 2.8374991     B 4.3684041 24.463624
## 2   781.2367 0.0000000 1.0220864 3.0220864     B 4.1350547 21.296975
## 3   473.9206 0.0000000 0.9705134 2.9705134     B 2.9378242 16.611108
## 4  1200.4653 0.0000000 1.4870747 1.4870747     B 1.4592569  6.055987
## 5  1135.9335 0.0000000 1.2190690 1.2190690     B 1.4859448  5.847812
## 6  1023.7081 0.0000000 0.8512596 0.8512596     B 1.0217795  2.975441
## 7  1453.5096 0.9428803 0.0000000 0.0000000     B 1.1996578 -2.614496
## 8  1227.9380 1.0776237 0.0000000 0.0000000     B 0.5602308 -3.047349
## 9  1858.8985 1.0152029 0.0000000 0.0000000     B 1.6275417 -3.328770
## 10 1361.2682 1.0581255 0.0000000 0.0000000     B 1.8852220 -3.383986

5 Class comparison

Statistical hypothesis testing is used to determine whether the abundance of a feature is different between two or more conditions.

In order to account for additional factors like the effect of experimental runs, subject-to-subject variability, etc., this is often done most appropriately using linear models.

This example uses a simple experiment with two conditions “A” and “B”, with three replicates in each condition.

set.seed(2020)
mse3 <- simulateImage(preset=4, npeaks=10, dim=c(10,10), sdnoise=0.3,
                    nruns=3, peakdiff=1, representation="centroid")

trt <- makeFactor(A=mse3$circleA, B=mse3$circleB)

image(mse3, trt ~ x * y, key=TRUE, layout=c(2,3))

image(mse3, feature=1, layout=c(2,3))

We know from the design of the simulation that the first 5 (of 10) mass features differ between the conditions.

featureData(mse3)

## MassDataFrame with 10 rows and 3 columns
##         :mz:   circleA   circleB      diff
##    <numeric> <numeric> <numeric> <logical>
## 1    473.921  0.970513   1.97051      TRUE
## 2    781.237  1.022086   2.02209      TRUE
## 3    788.863  0.837499   1.83750      TRUE
## 4   1023.708  0.851260   1.85126      TRUE
## 5   1135.933  1.219069   2.21907      TRUE
## 6   1200.465  1.487075   1.48707     FALSE
## 7   1227.938  1.077624   1.07762     FALSE
## 8   1361.268  1.058126   1.05813     FALSE
## 9   1453.510  0.942880   0.94288     FALSE
## 10  1858.899  1.015203   1.01520     FALSE

5.1 Group means-based testing

Use meansTest() to fit linear models with the most basic summarization. The groups indicating the observational units must be provided. Each group is summarized by its mean, and then a linear model is fit to the summaries. The number of groups is the effective sample size.

Here, we specify condition as the sole fixed effect. Internally, the model will call either lm() or lme() depending on whether any random effects are provided.

mtest <- meansTest(mse3, ~ condition, groups=run(mse3))

summary(mtest)

## Means-summarized linear model testing:
##  
##  Fixed effects: ~condition 
## 
##  Summarized 6 groups: runA1 runA2 ... runB2 runB3 
## 
##  Likelihood ratio test for fixed effects:
##  
##    Feature     LR      PValue       FDR
## 1        1 6.8325 0.008951300 0.0447565
## 2        2 2.2660 0.132239496 0.3305987
## 3        3 1.9045 0.167580606 0.3351612
## 4        4 7.0631 0.007868697 0.0447565
## 5        5 4.1586 0.041422884 0.1380763
## 6        6 0.0025 0.960241024 0.9654641
## 7        7 0.0019 0.965464107 0.9654641
## 8        8 0.0317 0.858660385 0.9654641
## 9        9 0.0684 0.793731073 0.9654641
## 10      10 0.1113 0.738668115 0.9654641

By default, the models are summarized by performing likelihood ratio tests against the null model (with no fixed effects, retaining any random effects).

Box-and-whisker plots can be used to visualize the differences (if any) between the conditions.

plot(mtest, layout=c(2,5), ylab="intensity")

Use topFeatures() to extract the significant results.

topFeatures(mtest, p.adjust="fdr", AdjP < .05)

## Top-ranked tests: ~condition vs ~1 
##          mz   circleA  circleB diff feature     LR      PValue      AdjP
## 1  473.9206 0.9705134 1.970513 TRUE       1 6.8325 0.008951300 0.0447565
## 2 1023.7081 0.8512596 1.851260 TRUE       4 7.0631 0.007868697 0.0447565

5.2 Segmentation-based testing

Testing of SpatialDGMM objects is implemented by segmentationTest(). The key idea here is that spatial-DGMM segmentation captures within-sample heterogeneity, so testing between spatial-DGMM segments is more sensitive that simply summarizing a whole sample by its mean.

First, we must segment the data with spatialDGMM(), while making sure that each observational unit is segmented within a different group (as specified by groups).

The number of groups is the effective sample size.

dgmm2 <- spatialDGMM(mse3, r=1, k=5, groups=run(mse3))

Now use segmentationTest() to fit the models.

In order to fit the models, a representative spatial-DGMM segment must be selected for each group. There are two automated ways to do this via classControl: "Ymax" means use the segment with the highest mean, and "Mscore" means use the segment with the highest match score with the fixed effects.

(A list of character vectors giving the explicit mapping between group and representative spatial-DGMM segment can also be given to classControl.)

stest <- segmentationTest(dgmm2, ~ condition, classControl="Ymax")

summary(stest)

## Segmentation-based linear model testing:
##  
##  Fixed effects: ~condition 
## 
##  Summarized 6 groups: runA1 runA2 ... runB2 runB3 
## 
##  Likelihood ratio test for fixed effects:
##  
##    Radius (r) Init (k) Feature      LR       PValue          FDR
## 1           1        5       1 11.8886 5.647717e-04 1.882572e-03
## 2           1        5       2  2.4230 1.195682e-01 2.391363e-01
## 3           1        5       3 10.3180 1.317408e-03 3.293519e-03
## 4           1        5       4 24.1880 8.737726e-07 4.368863e-06
## 5           1        5       5 56.0346 7.120806e-14 7.120806e-13
## 6           1        5       6  0.0407 8.401315e-01 9.549968e-01
## 7           1        5       7  0.0268 8.700191e-01 9.549968e-01
## 8           1        5       8  1.1417 2.852973e-01 4.075675e-01
## 9           1        5       9  0.0032 9.549968e-01 9.549968e-01
## 10          1        5      10  1.3016 2.539213e-01 4.075675e-01

As with meansTest(), the models are summarized by performing likelihood ratio tests against the null model (with no fixed effects, retaining any random effects).

Box-and-whisker plots can be used to visually compare the conditions.

plot(stest, layout=c(2,5), ylab="intensity")

If an automated method for classControl was used, it can be helpful to plot the mapping to see what segments were used to represent each group.

image(stest, model=list(feature=3), values="mapping")

In this case, segmentationTest() finds two more significant mass features compared to meansTest().

topFeatures(stest, p.adjust="fdr", AdjP < .05) %>% subset(select=c(-diff, -r, -k))

## Top-ranked tests: ~condition vs ~1 
##          mz   circleA  circleB feature      LR       PValue         AdjP
## 1 1135.9335 1.2190690 2.219069       5 56.0346 7.120806e-14 7.120806e-13
## 2 1023.7081 0.8512596 1.851260       4 24.1880 8.737726e-07 4.368863e-06
## 3  473.9206 0.9705134 1.970513       1 11.8886 5.647717e-04 1.882572e-03
## 4  788.8633 0.8374991 1.837499       3 10.3180 1.317408e-03 3.293519e-03

6 Session information

sessionInfo()

## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] Cardinal_2.12.0     ProtGenerics_1.26.0 S4Vectors_0.32.0   
## [4] EBImage_4.36.0      BiocParallel_1.28.0 BiocGenerics_0.40.0
## [7] BiocStyle_2.22.0   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.7          mclust_5.4.7        locfit_1.5-9.4     
##  [4] lattice_0.20-45     fftwtools_0.9-11    png_0.1-7          
##  [7] assertthat_0.2.1    digest_0.6.28       utf8_1.2.2         
## [10] R6_2.5.1            tiff_0.1-8          signal_0.7-7       
## [13] evaluate_0.14       highr_0.9           pillar_1.6.4       
## [16] rlang_0.4.12        irlba_2.3.3         jquerylib_0.1.4    
## [19] magick_2.7.3        Matrix_1.3-4        rmarkdown_2.11     
## [22] matter_1.20.0       stringr_1.4.0       htmlwidgets_1.5.4  
## [25] RCurl_1.98-1.5      compiler_4.1.1      xfun_0.27          
## [28] pkgconfig_2.0.3     biglm_0.9-2.1       htmltools_0.5.2    
## [31] tidyselect_1.1.1    tibble_3.1.5        bookdown_0.24      
## [34] fansi_0.5.0         viridisLite_0.4.0   crayon_1.4.1       
## [37] dplyr_1.0.7         MASS_7.3-54         bitops_1.0-7       
## [40] grid_4.1.1          nlme_3.1-153        jsonlite_1.7.2     
## [43] lifecycle_1.0.1     DBI_1.1.1           magrittr_2.0.1     
## [46] stringi_1.7.5       sp_1.4-5            bslib_0.3.1        
## [49] ellipsis_0.3.2      generics_0.1.1      vctrs_0.3.8        
## [52] tools_4.1.1         Biobase_2.54.0      glue_1.4.2         
## [55] purrr_0.3.4         jpeg_0.1-9          abind_1.4-5        
## [58] parallel_4.1.1      fastmap_1.1.0       yaml_2.2.1         
## [61] BiocManager_1.30.16 knitr_1.36          sass_0.4.0

Cardinal 2: Statistical methods for mass spectrometry imaging

Revised: May 16, 2019

Contents