Using SIGHTS R-package
Elika Garg, Carl Murie and Robert Nadon
Abstract
Identifying rare biological events in high-throughput screens requires using the best available normalization and statistical inference procedures. It is not always clear, however, which algorithms are best suited for a particular screen. The Statistics and dIagnostics Graphs for High Throughput Screening (SIGHTS) R package is designed for statistical analysis and visualization of HTS assays. It provides graphical diagnostic tools to guide researchers in choosing the most appropriate normalization algorithm and statistical test for identifying active constructs.
The sights
package provides numerous normalization
methods that correct the three types of bias that affect High-Throughput
Screening (HTS) measurements: overall plate bias, within-plate spatial
bias, and across-plate bias. Commonly-used normalization methods such as
Z-scores (or methods such as percent inhibition/activation which use
within-plate controls to normalize) correct only overall plate bias.
Methods included in this package attempt to correct all three sources of
bias and typically give better results.
Two statistical tests are also provided: the standard one-sample t-test and the recommended one-sample Random Variance Model (RVM) t-test, which has greater statistical power for the typically small number of replicates in HTS. Correction for the multiple statistical testing of the large number of constructs in HTS data is provided by False Discovery Rate (FDR) correction. The FDR can be described as the proportion of false positives among the statistical tests called significant.
Included graphical and statistical methods provide the means for evaluating data analysis choices for HTS assays on a screen-by-screen basis. These graphs can be used to check fundamental assumptions of both raw and normalized data at every step of the analysis process.
Citing Methods
Please cite the sights
package and specific methods as
appropriate.
References for the methods can be found in this vignette, on their
specific help pages, and in the manual. They can also be accessed by
help(sights_method_name)
in R. For example:
The package citation can be accessed in R by:
citation("sights")
>> To cite package 'sights' in publications use:
>>
>> Garg E, Murie C, Nadon R (2016). _sights: Statistics and dIagnostic
>> Graphs for HTS_. R package version 1.33.0.
>>
>> A BibTeX entry for LaTeX users is
>>
>> @Manual{,
>> title = {sights: Statistics and dIagnostic Graphs for HTS},
>> author = {Elika Garg and Carl Murie and Robert Nadon},
>> year = {2016},
>> note = {R package version 1.33.0},
>> }
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("sights")
library("sights")
All SIGHTS normalization functions require that the data be arranged
such that each plate is a column and each row is a well. The arrangement
within each plate should be by-row first, then by-column. For more
details and example, see help("ex_dataMatrix")
. This
required arrangement can be done in Microsoft Excel before importing the
data into R, although advanced users may prefer to do so in R as
needed.
data("ex_dataMatrix")
help("ex_dataMatrix")
## Required data arrangement (by-row first) is explained.
data("inglese")
read.csv("~/yourfile.csv", header = TRUE, sep = ",")
## '~' is the folder location of your file 'yourfile.csv'.
## Use header=TRUE if you have column headers (recommended); otherwise, use
## header=FALSE.
## N.B. Be sure to use a forward slash ('/') to separate folder names.
install.packages("xlsx")
## This installs the xlsx package which enables import/export of Excel files.
library("xlsx")
read.xlsx("~/yourfile.xlsx", sheetIndex = 1) # or
read.xlsx("~/yourfile.xlsx", sheetName = "one")
## sheetIndex is the sheet number where your data is stored in 'yourfile.xlsx';
## sheetName is the name of that sheet.
help("ex_dataMatrix")
help("inglese")
View(inglese)
## View the entire dataset
edit(inglese)
## Edit the dataset
head(inglese)
## View the top few rows of the dataset
str(inglese)
## Get information on the structure of the dataset
summary(inglese)
## Get a summary of variables in the dataset
names(inglese)
## Get the variable names of the dataset
See help("normSights")
, help("statSights")
,
help("plotSights")
, and the help pages of individual
methods for more information.
ls("package:sights")
## Lists all the functions and datasets available in the package
lsf.str("package:sights")
## Lists all the functions and their usage
args(plotSights)
## View the usage of a specific function
example(topic = plotSights, package = "sights")
## View examples of a specific function
Normalization - All normalization functions are accessible either
via normSights()
or their individual function names
(e.g. normSPAWN()
).
Statistical tests - All statistical testing functions are
accessible either via statSights()
or their individual
function names (e.g. statRVM()
).
Plots - All plotting functions are accessible either via
plotSights()
or their individual function names
(e.g. plotAutoco()
).
The results of these functions can be saved as objects and called by their assigned names. For example:
library(sights)
data("inglese")
# Normalize
spawn_results <- normSPAWN(dataMatrix = inglese, plateRows = 32, plateCols = 40,
dataRows = NULL, dataCols = 3:44, trimFactor = 0.2, wellCorrection = TRUE, biasMatrix = NULL,
biasCols = 1:18)
## Or
spawn_results <- normSights(normMethod = "SPAWN", dataMatrix = inglese, plateRows = 32,
plateCols = 40, dataRows = NULL, dataCols = 3:44, trimFactor = 0.2, wellCorrection = TRUE,
biasMatrix = NULL, biasCols = 1:18)
## Access
summary(spawn_results)
# Apply statistical test
rvm_results <- statRVM(normMatrix = spawn_results, repIndex = rep(1:3, each = 3),
normRows = NULL, normCols = 1:9, testSide = "two.sided")
## Or
rvm_results <- statSights(statMethod = "RVM", normMatrix = spawn_results, repIndex = c(1,
1, 1, 2, 2, 2, 3, 3, 3), normRows = NULL, normCols = 1:9, ctrlMethod = NULL,
testSide = "two.sided")
## Access
head(rvm_results)
# Plot
autoco_results <- plotAutoco(plotMatrix = spawn_results, plateRows = 32, plateCols = 40,
plotRows = NULL, plotCols = 1:9, plotName = "SPAWN_Inglese", plotSep = TRUE)
## Or
autoco_results <- plotSights(plotMethod = "Autoco", plotMatrix = spawn_results, plateRows = 32,
plateCols = 40, plotRows = NULL, plotCols = c(1, 2, 3, 4, 5, 6, 7, 8, 9), plotName = "SPAWN_Inglese",
plotSep = TRUE)
## Access
autoco_results
autoco_results[[1]]
All SIGHTS plotting functions, which use the ggplot2 package (Wickham, 2009) (i.e., all except
plot3d
that uses lattice graphics), have an ellipsis
argument (“…”) which passes on additional parameters to the specific
ggplot geom being used in that function. For example, the
default plot title and the bar colors of the histogram can be modified
as follows:
sights::plotHist(plotMatrix = SPAWN.norm.inglese.09.rvm, plotCols = 5, plotAll = TRUE,
binwidth = 0.02, fill = "pink", color = "black", plotName = "RVM test Exp9")
>> Number of samples = 1
>> Number of plots = 1
>> Number of plate wells = 1280
All SIGHTS plotting functions, which use ggplot, produce ggplot objects that can be modified.
Other packages which provide more plotting options can be installed as well: ggthemes (Arnold and Arnold, 2015), gridExtra (Auguie et al., 2015).
install.packages("ggthemes")
## This installs the ggthemes package, which has various themes that can be
## used with ggplot objects.
library("ggthemes")
install.packages("gridExtra")
## This installs the gridExtra package, which enables arrangement of plot
## objects.
library("gridExtra")
Below are some examples of the plotting modifications that can be achieved using ggplot2/ggthemes/gridExtra Auguie et al. (2015) functions:
b <- sights::plotBox(plotMatrix = inglese, plotCols = 33:35)
>> Number of plots = 1
>> Number of plates = 3
>> Number of plate wells = 1280
b + ggplot2::geom_boxplot(fill = c("rosybrown", "pink", "thistle")) + ggthemes::theme_igray() +
ggplot2::labs(x = "Sample_11 Replicates", y = "Raw Values")
Note: When plotSep = TRUE, a list of plot objects is produced, which can be called individually and modified, as in the example below.
s <- sights::plotScatter(plotMatrix = SPAWN.norm.inglese.09, repIndex = c(1, 1, 1))
>> Number of samples = 1
>> Number of plots = 3
>> Number of plates = 3
>> Number of plate wells = 1280
s[[2]] + ggplot2::labs(title = "Original Scatter Plot")
>> `geom_smooth()` using formula = 'y ~ x'
>> Warning: Removed 1 row containing missing values or values outside the scale range
>> (`geom_smooth()`).
s[[2]] + ggplot2::lims(x = c(-5, 5), y = c(-5, 5)) + ggplot2::labs(title = "Constrained Scatter Plot")
>> Scale for x is already present.
>> Adding another scale for x, which will replace the existing scale.
>> Scale for y is already present.
>> Adding another scale for y, which will replace the existing scale.
>> `geom_smooth()` using formula = 'y ~ x'
>> Warning: Removed 7 rows containing non-finite outside the scale range
>> (`stat_smooth()`).
>> Warning: Removed 7 rows containing missing values or values outside the scale range
>> (`geom_point()`).
s[[2]] + ggplot2::coord_cartesian(xlim = c(-5, 5), ylim = c(-5, 5)) + ggplot2::labs(title = "Zoomed-in Scatter Plot")
>> Coordinate system already present. Adding new coordinate system, which will
>> replace the existing one.
>> `geom_smooth()` using formula = 'y ~ x'
>> Warning: Removed 1 row containing missing values or values outside the scale range
>> (`geom_smooth()`).
box <- sights::plotSights(plotMethod = "Box", plotMatrix = SPAWN.norm.inglese.09,
plotCols = 1:3) + ggplot2::theme(plot.title = ggplot2::element_text(size = 12))
>> Number of plots = 1
>> Number of plates = 3
>> Number of plate wells = 1280
autoco <- sights::plotSights(plotMethod = "Autoco", plotMatrix = SPAWN.norm.inglese.09,
plateRows = 32, plateCols = 40, plotRows = NULL, plotCols = 1:3, plotSep = FALSE) +
ggplot2::theme(plot.title = ggplot2::element_text(size = 12))
>> Number of plots = 1
>> Number of plates = 3
>> Number of plate wells = 1280
scatter <- sights::plotSights(plotMethod = "Scatter", plotMatrix = SPAWN.norm.inglese.09,
repIndex = c(1, 1, 1), plotRows = NULL, plotCols = 1:3)
>> Number of samples = 1
>> Number of plots = 3
>> Number of plates = 3
>> Number of plate wells = 1280
sc1 <- scatter[[1]] + ggplot2::theme(plot.title = ggplot2::element_text(size = 12))
sc2 <- scatter[[2]] + ggplot2::theme(plot.title = ggplot2::element_text(size = 12))
sc3 <- scatter[[3]] + ggplot2::theme(plot.title = ggplot2::element_text(size = 12))
sc <- gridExtra::grid.arrange(sc1, sc2, sc3, ncol = 3)
>> `geom_smooth()` using formula = 'y ~ x'
>> Warning: Removed 1 row containing missing values or values outside the scale range
>> (`geom_smooth()`).
>> `geom_smooth()` using formula = 'y ~ x'
>> Warning: Removed 1 row containing missing values or values outside the scale range
>> (`geom_smooth()`).
>> `geom_smooth()` using formula = 'y ~ x'
>> Warning: Removed 2 rows containing missing values or values outside the scale range
>> (`geom_smooth()`).
ab <- gridExtra::grid.arrange(box, autoco, ncol = 2)