Introduction

MeSH (Medical Subject Headings) is the NLM (U.S. National Library of Medicine) controlled vocabulary used to manually index articles for MEDLINE/PubMed. MeSH is comprehensive life science vocabulary. MeSH has 19 categories and MeSH.db contains 16 of them. That is:

Abbreviation Category
A Anatomy
B Organisms
C Diseases
D Chemicals and Drugs
E Analytical, Diagnostic and Therapeutic Techniques and Equipment
F Psychiatry and Psychology
G Phenomena and Processes
H Disciplines and Occupations
I Anthropology, Education, Sociology and Social Phenomena
J Technology and Food and Beverages
K Humanities
L Information Science
M Persons
N Health Care
V Publication Type
Z Geographical Locations

MeSH terms were associated with Entrez Gene ID by three methods, gendoo, gene2pubmed and RBBH (Reciprocal Blast Best Hit).

Method Way of corresponding Entrez Gene IDs and MeSH IDs
Gendoo Text-mining
gene2pubmed Manual curation by NCBI teams
RBBH sequence homology with BLASTP search (E-value<10-50)

Enrichment Analysis

Please go to https://yulab-smu.github.io/clusterProfiler-book/chapter9.html for full vignette of enrichment analysis using MeSH.

Semantic Similarity

meshes implemented four IC-based methods (i.e. Resnik(Philip 1999), Jiang(Jiang and Conrath 1997), Lin(Lin 1998) and Schlicker(Schlicker et al. 2006)) and one graph-structure based method (i.e. Wang(Wang et al. 2007)). For algorithm details, please refer to the vignette of GOSemSim package(Yu et al. 2010)

meshSim function is designed to measure semantic similarity between two MeSH term vectors.

## [1] 0.2910261
## [1] 0.521396
## [1] 0.4914785
## [1] 0.5557103
##           D017629   D002890   D008928
## D001369 0.2886598 0.1923711 0.2193326
## D002462 0.6521739 0.2381925 0.2809552

geneSim function is designed to measure semantic similarity among two gene vectors.

## [1] 0.487
##       835  5261   241   994
## 241 0.732 0.337 1.000 0.438
## 251 0.526 0.588 0.487 0.597

Need helps?

If you have questions/issues, please visit meshes homepage first. Your problems are mostly documented. If you think you found a bug, please follow the guide and provide a reproducible example to be posted on github issue tracker. For questions, please post to Bioconductor support site and tag your post with meshes.

For Chinese user, you can follow me on WeChat (微信).

Session Information

Here is the output of sessionInfo() on the system on which this document was compiled:

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] meshes_1.12.0         DOSE_3.12.0           MeSH.db_1.13.0       
## [4] MeSH.Hsa.eg.db_1.13.0 MeSHDbi_1.22.0        BiocGenerics_0.32.0  
## 
## loaded via a namespace (and not attached):
##  [1] enrichplot_1.6.0     bit64_0.9-7          RColorBrewer_1.1-2  
##  [4] progress_1.2.2       httr_1.4.1           tools_3.6.1         
##  [7] backports_1.1.5      R6_2.4.0             DBI_1.0.0           
## [10] lazyeval_0.2.2       colorspace_1.4-1     tidyselect_0.2.5    
## [13] gridExtra_2.3        prettyunits_1.0.2    bit_1.1-14          
## [16] compiler_3.6.1       Biobase_2.46.0       xml2_1.2.2          
## [19] triebeard_0.3.0      scales_1.0.0         ggridges_0.5.1      
## [22] stringr_1.4.0        digest_0.6.22        rmarkdown_1.16      
## [25] pkgconfig_2.0.3      htmltools_0.4.0      rlang_0.4.1         
## [28] RSQLite_2.1.2        prettydoc_0.3.0      gridGraphics_0.4-1  
## [31] farver_1.1.0         jsonlite_1.6         BiocParallel_1.20.0 
## [34] GOSemSim_2.12.0      dplyr_0.8.3          magrittr_1.5        
## [37] ggplotify_0.0.4      GO.db_3.10.0         Matrix_1.2-17       
## [40] Rcpp_1.0.2           munsell_0.5.0        S4Vectors_0.24.0    
## [43] viridis_0.5.1        lifecycle_0.1.0      stringi_1.4.3       
## [46] yaml_2.2.0           ggraph_2.0.0         MASS_7.3-51.4       
## [49] plyr_1.8.4           qvalue_2.18.0        grid_3.6.1          
## [52] blob_1.2.0           ggrepel_0.8.1        DO.db_2.9           
## [55] crayon_1.3.4         lattice_0.20-38      graphlayouts_0.5.0  
## [58] cowplot_1.0.0        splines_3.6.1        hms_0.5.1           
## [61] zeallot_0.1.0        knitr_1.25           pillar_1.4.2        
## [64] fgsea_1.12.0         igraph_1.2.4.1       reshape2_1.4.3      
## [67] stats4_3.6.1         fastmatch_1.1-0      glue_1.3.1          
## [70] evaluate_0.14        BiocManager_1.30.9   data.table_1.12.6   
## [73] vctrs_0.2.0          tweenr_1.0.1         urltools_1.7.3      
## [76] gtable_0.3.0         purrr_0.3.3          polyclip_1.10-0     
## [79] tidyr_1.0.0          assertthat_0.2.1     ggplot2_3.2.1       
## [82] xfun_0.10            ggforce_0.3.1        europepmc_0.3       
## [85] tidygraph_1.1.2      viridisLite_0.3.0    tibble_2.1.3        
## [88] rvcheck_0.1.5        AnnotationDbi_1.48.0 memoise_1.1.0       
## [91] IRanges_2.20.0

References

Jiang, Jay J., and David W. Conrath. 1997. “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy.” Proceedings of 10th International Conference on Research in Computational Linguistics. http://www.citebase.org/abstract?id=oai:arXiv.org:cmp-lg/9709008.

Lin, Dekang. 1998. “An Information-Theoretic Definition of Similarity.” In Proceedings of the 15th International Conference on Machine Learning, 296—304. https://doi.org/10.1.1.55.1832.

Philip, Resnik. 1999. “Semantic Similarity in a Taxonomy: An Information-Based Measure and Its Application to Problems of Ambiguity in Natural Language.” Journal of Artificial Intelligence Research 11:95–130.

Schlicker, Andreas, Francisco S Domingues, Jorg Rahnenfuhrer, and Thomas Lengauer. 2006. “A New Measure for Functional Similarity of Gene Products Based on Gene Ontology.” BMC Bioinformatics 7:302. https://doi.org/1471-2105-7-302.

Wang, James Z, Zhidian Du, Rapeeporn Payattakool, Philip S Yu, and Chin-Fu Chen. 2007. “A New Method to Measure the Semantic Similarity of Go Terms.” Bioinformatics (Oxford, England) 23 (May):1274–81. https://doi.org/btm087.

Yu, Guangchuang, and Qing-Yu He. 2016. “ReactomePA: An R/Bioconductor Package for Reactome Pathway Analysis and Visualization.” Molecular BioSystems 12 (2):477–79. https://doi.org/10.1039/C5MB00663E.

Yu, Guangchuang, Fei Li, Yide Qin, Xiaochen Bo, Yibo Wu, and Shengqi Wang. 2010. “GOSemSim: An R Package for Measuring Semantic Similarity Among Go Terms and Gene Products.” Bioinformatics 26 (april):976–78. https://doi.org/10.1093/bioinformatics/btq064.

Yu, Guangchuang, Li-Gen Wang, Yanyan Han, and Qing-Yu He. 2012. “clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters.” OMICS: A Journal of Integrative Biology 16 (5):284–87. https://doi.org/10.1089/omi.2011.0118.

Yu, Guangchuang, Li-Gen Wang, Guang-Rong Yan, and Qing-Yu He. 2015. “DOSE: An R/Bioconductor Package for Disease Ontology Semantic and Enrichment Analysis.” Bioinformatics 31 (4):608–9. https://doi.org/10.1093/bioinformatics/btu684.