The package immunogenViewer
is meant to support researchers in comparing and choosing suitable antibodies provided that information on the immunogen used to raise the antibody is available. When the immunogen of an antibody is known, its binding site within the protein antigen is defined and can be examined in detail. As antibodies raised against peptide immunogens often do not function properly when used to detect natively folded proteins (Brown et al. 2011), examination of the position of the immunogen within the full-length protein can provide insights. Using immunogenViewer
provides an easy approach to visualize, evaluate and compare immunogens within the full-length sequence of a protein. Information on structural and functional annotations of the immunogen and thus antibody binding site can tell the user if an antibody is potentially useful for native protein detection (Trier, Hansen, and Houen 2012; Waury et al. 2022).
Specifically, immunogenViewer
can be used to retrieve protein features for a protein of interest using an API call to the UniProtKB (Bateman et al. 2022) and PredictProtein (Bernhofer et al. 2021) databases. The features are saved on a per-residue level in a dataframe. One or several immunogens can be associated with the protein. The immunogen(s) can then be visualized and evaluated regarding their structure and other annotations that can influence successful antibody recognition within the full-length protein. A summary report of the immunogen can be created to easily compare and select favorable immunogens and their respective antibodies. This package should be used as a pre-selection step to exclude unsuitable antibodies early on. It does not replace comprehensive antibody validation. For more information on validation, please refer to other excellent resources (Roncador et al. 2015; Voskuil et al. 2020).
The package can be installed directly from Bioconductor.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("immunogenViewer")
library(immunogenViewer)
To retrieve the features for the protein of interest the correct UniProt ID (also known as accession number) is required. If the UniProt ID is not known yet, one can search the UniProtKB using the gene or protein name. Be sure to select the UniProt ID of the correct organism and preferable search within reviewed SwissProt entries instead of unreviewed TrEMBL entries. Our example protein is the human protein TREM2 (UniProt ID: Q9NZC2). Using getProteinFeatures()
relevant features from UniProt and PredictProtein are retrieved. Interaction with UniProt is done using the Bioconductor package UniProt.ws. To see how the dataframe is structured, we will look at the returned dataframe.
protein <- getProteinFeatures("Q9NZC2")
# check protein dataframe
DT::datatable(protein, width = "80%", options = list(scrollX = TRUE))
After creating the protein dataframe using getProteinFeatures()
immunogens to be visualized and evaluated can be added to the dataframe. For this purpose, we use addImmunogen()
. With every call to the function one immunogen can be added to the protein dataframe. Besides the protein dataframe, we need to define the immunogen to be added by supplying the start and end position of the immunogen and a name.
Searching antibody database Antibodypedia, three antibodies are identified that were raised against known immunogens peptide. These immunogens are added to the dataframe by defining their start and end position or the immunogen peptide sequence within the full protein sequence and naming them after their catalog identifiers. Each immunogen is added as an additional column to the protein dataframe, the immunogen name is used as the column name.
protein <- addImmunogen(protein, start = 142, end = 192, name = "ABIN2783734_")
protein <- addImmunogen(protein, start = 196, end = 230, name = "HPA010917")
protein <- addImmunogen(protein, seq = "HGQKPGTHPPSELD", name = "EB07921")
# check that immunogens were added as columns
colnames(protein)
[1] "Uniprot" "Position" "Residue"
[4] "SecondaryStructure" "SolventAccessibility" "Membrane"
[7] "ProteinBinding" "Disorder" "PTM"
[10] "DisulfideBridge" "ABIN2783734_" "HPA010917"
[13] "EB07921"
Already added immunogens can be renamed using renameImmunogen()
if the provided start and end position are correct but the name should be updated. This way a typo can be corrected or a more informative name added instead of re-adding the immunogen. The column name in the protein dataframe is then updated.
protein <- renameImmunogen(protein, oldName = "ABIN2783734_", newName = "ABIN2783734")
# check that immunogen name was updated
colnames(protein)
[1] "Uniprot" "Position" "Residue"
[4] "SecondaryStructure" "SolventAccessibility" "Membrane"
[7] "ProteinBinding" "Disorder" "PTM"
[10] "DisulfideBridge" "ABIN2783734" "HPA010917"
[13] "EB07921"
A previously added immunogen can be removed from the protein dataframe using removeImmunogen()
. The corresponding column is dropped from the protein dataframe.
protein <- removeImmunogen(protein, name = "HPA010917")
# check that immunogen was removed
colnames(protein)
[1] "Uniprot" "Position" "Residue"
[4] "SecondaryStructure" "SolventAccessibility" "Membrane"
[7] "ProteinBinding" "Disorder" "PTM"
[10] "DisulfideBridge" "ABIN2783734" "EB07921"
After retrieval of the protein features and adding the relevant immunogens correctly, the full protein sequence can be plotted with the features and the immunogens annotated along the sequence. The plot allows to understand the position of the immunogen peptide within the full-length sequence as well as identify relevant obstacles within the protein that might hinder or limit successful antibody binding.
plotProtein(protein)
If interested in one specific immunogen, one can visualize the relevant part of the protein sequence. In this plot the amino acid sequence of the immunogen is shown along the x axis while the same features as in the protein plot are included.
plotImmunogen(protein, "ABIN2783734")
Apart from visualizing specific immunogens, it is also possible to summarize the protein features within a specific immunogen. This can either be done for an immunogen of interest or for all immunogens added to a protein dataframe at once. The output is a summary dataframe that can be sorted by the feature columns. By sorting the most suitable immunogen, e.g., with the highest fraction of exposed residues, can be selected.
immunogens <- evaluateImmunogen(protein)
[1] "No immunogen specified, evaluating all immunogens."
[1] "Immunogen name: SecondaryStructure"
[1] "Immunogen sequence: (Residues Inf - -Inf)"
[1] "Immunogen name: SolventAccessibility"
[1] "Immunogen sequence: (Residues Inf - -Inf)"
[1] "Immunogen name: ProteinBinding"
[1] "Immunogen sequence: SMKTHNLWLLCQSLH (Residues 40 - 114)"
[1] "Immunogen name: DisulfideBridge"
[1] "Immunogen sequence: CCCC (Residues 36 - 110)"
[1] "Immunogen name: ABIN2783734"
[1] "Immunogen sequence: WFPGESESFEDAHVEHSISRSLLEGEIPFPPTSILLLLACIFLIKILAASA (Residues 142 - 192)"
[1] "Immunogen name: EB07921"
[1] "Immunogen sequence: HGQKPGTHPPSELD (Residues 199 - 212)"
# check summary dataframe
DT::datatable(immunogens, width = "80%", options = list(scrollX = TRUE))
getProteinFeatures()
the taxonomy ID for the protein’s species has to be set. The default is human (ID: 9606). If the protein of interest is from a different species, the correct taxonomy ID must be set as a parameter.sessionInfo()
R Under development (unstable) (2024-10-21 r87258)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS
Matrix products: default
BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] immunogenViewer_1.1.0 BiocStyle_2.35.0
loaded via a namespace (and not attached):
[1] KEGGREST_1.47.0 gtable_0.3.6 xfun_0.48
[4] bslib_0.8.0 ggplot2_3.5.1 htmlwidgets_1.6.4
[7] Biobase_2.67.0 crosstalk_1.2.1 rjsoncons_1.3.1
[10] vctrs_0.6.5 tools_4.5.0 generics_0.1.3
[13] stats4_4.5.0 curl_5.2.3 tibble_3.2.1
[16] fansi_1.0.6 AnnotationDbi_1.69.0 RSQLite_2.3.7
[19] highr_0.11 blob_1.2.4 pkgconfig_2.0.3
[22] BiocBaseUtils_1.9.0 dbplyr_2.5.0 S4Vectors_0.45.0
[25] lifecycle_1.0.4 GenomeInfoDbData_1.2.13 farver_2.1.2
[28] compiler_4.5.0 Biostrings_2.75.0 progress_1.2.3
[31] tinytex_0.53 munsell_0.5.1 GenomeInfoDb_1.43.0
[34] htmltools_0.5.8.1 sass_0.4.9 yaml_2.3.10
[37] pillar_1.9.0 crayon_1.5.3 jquerylib_0.1.4
[40] DT_0.33 cachem_1.1.0 magick_2.8.5
[43] tidyselect_1.2.1 digest_0.6.37 dplyr_1.1.4
[46] bookdown_0.41 labeling_0.4.3 UniProt.ws_2.47.0
[49] fastmap_1.2.0 grid_4.5.0 colorspace_2.1-1
[52] cli_3.6.3 magrittr_2.0.3 patchwork_1.3.0
[55] utf8_1.2.4 httpcache_1.2.0 withr_3.0.2
[58] scales_1.3.0 prettyunits_1.2.0 filelock_1.0.3
[61] UCSC.utils_1.3.0 bit64_4.5.2 rmarkdown_2.28
[64] XVector_0.47.0 httr_1.4.7 bit_4.5.0
[67] png_0.1-8 hms_1.1.3 memoise_2.0.1
[70] evaluate_1.0.1 knitr_1.48 IRanges_2.41.0
[73] BiocFileCache_2.15.0 rlang_1.1.4 Rcpp_1.0.13
[76] glue_1.8.0 DBI_1.2.3 BiocManager_1.30.25
[79] BiocGenerics_0.53.0 jsonlite_1.8.9 R6_2.5.1
[82] zlibbioc_1.53.0