tripr User Guide

Maria Th. Kotouza1, Katerina Gemenetzi2, Chrysi Galigalidou2, Elisavet Vlachonikola2, Nikolaos Pechlivanis2, Andreas Agathangelidis2, Raphael Sandaltzopoulos3, Pericles A. Mitkas1, Kostas Stamatopoulos2, Anastasia Chatzidimitriou2 and Fotis E. Psomopoulos2*

1Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, GR
2Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, GR
3Department of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupolis, GR

*fpsom@certh.gr

26 October 2021

Package

tripr 1.0.0

1 Introduction

tripr is a Bioconductor package, written in shiny that provides analytics services on antigen receptor (B cell receptor immunoglobulin, BcR IG | T cell receptor, TR) gene sequence data. Every step of the analysis can be performed interactively, thus not requiring any programming skills. It takes as input the output files of the IMGT/HighV-Quest tool. Users can select to analyze the data from each of the input samples separately, or the combined data files from all samples and visualize the results accordingly. Functions for an R command-line use are also available.

1.1 Installation

tripr is distributed as a Bioconductor package and requires R (version “4.1”), which can be installed on any operating system from CRAN, and Bioconductor (version “3.14”).

To install tripr package enter the following commands in your R session:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("tripr")

## Check that you have a valid Bioconductor installation
BiocManager::valid()

1.2 Launching the app

Once tripr is successfully installed, it can be loaded as follow:

library(tripr)

2 Running `tripr` as a `shiny` application

In order to start the shiny app, please run the following command:

tripr::run_app()

tripr should be opening in a browser (ideally Chrome, Firefox or Opera). If this does not happen automatically, please open a browser and navigate to the address shown on the R console (for example, Listening on http://127.0.0.1:6134).

2.1 Home

In this tab users can import their data by selecting the directory where the data is stored, by pressing the Choose directory button. The tool takes as input the 10 output files of the IMGT/HighV-Quest tool in text format (.txt). Users can also choose only some of the files depending on the type of the downstream analysis.

Note that every sample of the dataset must have its own individual folder and every sample folder must be in one root folder (See example below). For the dataset to be selected for upload, this root folder must be selected and then the button Load Data has to be pressed.

Previous sessions can also be loaded with the Restore Previous Sessions button.

There are 2 options regarding the cell type (T cell and B cell) as well as 2 options based on the amount of available data (High- or Low-Throughput). Concerning the latter, the main difference is the application of the preselection and selection steps. In the case of High-Throughput data, all filters are applied consequentially (i.e. if a sequence fails >1 selection criteria, only the first unsatisfied criterion will be reported), whereas for Low-Throughput data all criteria are applied at the same time.

2.2 Preprocessing

tripr offers 2 steps of preprocessing:

Preselection: Refers to the cleaning process of the input dataset.
Selection: Refers to the filtering process of the resulting data from Preselection process.

2.2.1 Preselection

The Preselection process comprises 4 different criteria:

Only take into account Functional V-Gene:
Only sequences utilizing a functional V gene are included into the downstream analysis. Sequences with pseudogenes (P) or open reading frame (ORF) genes are excluded from further analysis.
Only take into account CDR3 with no Special Characters (X,*,#,.):
Only sequences without ambiguities (i.e. characters other than those of the 20 amino acids) are included in the analysis.
Only take into account Productive Sequences:
Only productive sequences (without stop codons and frameshifts) are included in the analysis.
Only take into account CDR3 with valid start/end landmarks:
Start/End CDR3 landmarks (anchors) can be customized by the user based on the type of data (BcR/TR, heavy/light chain). More than one valid landmark can be used. The different letters should be separated with a vertical bar (e.g. F|D). Sequences with landmarks other than the chosen ones are excluded from the analysis.

The execution starts when the Apply button is pressed.

Users can visualize the results of the preselection (first cleaning) process in the Preselection tab. In the case of multi-sample datasets, results are provided for each individual sample separately, or for the combined dataset by scrolling through the Select Dataset option.

The output consists of 4 table files:

Summary: a summary table with both the included and excluded sequences for each different criterion
All Data table: the entire set of data
Clean table: the sequences that meet the preselection criteria and are included in the analysis and
Clean out table: the excluded sequences. The last column of the “Clean out table” refers to the unsatisfied criteria.

The figure below shows an example Clean table from this Tab.

All 4 tables can be downloaded as text files.

2.2.2 Selection

The sequences that passed through the Preselection process (“Clean table”) are used as input for the data Selection (filtering) process.

This step comprises 6 different filters:

V-REGION identity %: Sequences with identity percent to germline that do not fall in the range set by the user are excluded from the analysis.
Select Specific V Gene
Select Specific J Gene
Select Specific D Gene

Using the above 3 filters the user can select for sequences that carry one or more particular V, J and D genes or gene alleles, respectively. Different genes/gene alleles should be separated with a vertical line (|), e.g. TRBV11-2|TRBV29-1*03.

Select CDR3 length range: Only sequences with the selected CDR3 lengths are included in the analysis.
Only select CDR3 containing specific amino-acid sequence: Sequences with the specific CDR3 amino acid motif provided by the user are included in the analysis.

The execution starts when the Execute button is pressed.

The results of the Selection (filtering) process are presented in the Selection tab.

This process provides 4 output files:

Summary: a summary table with both the included and excluded sequences for each filter
All Data table: the data used as input after the Preselection process
Filter in table: the sequences that passed through the selection filters and
Filter out table: the excluded sequences. The last column of the “Filter out table” refers to the filters that were not passed by each individual sequence.

All the tables can be downloaded as text files.

2.3 Pipelines & Step dependencies

Users can select the workflow that they want to apply to their dataset(s).

There are 11 different tools in the pipeline tab. 7 of them can be applied for both T- and B-cells, while the remaining 4 can be applied only for B-cells.

Step Dependencies in Pipeline

In order to apply Highly Similar Clonotypes computation, Clonotypes computation should have been selected previously.
In order to apply Repertoires Extraction, Clonotypes computation should have run previously. If Highly Similar Clonotypes computation has been selected, repertoires will be extracted for both total clonotypes and highly similar clonotypes.
The Somatic hypermutation status is applied using the groups that have been selected at Insert Identity groups.
If both Alignment and Clonotypes computation have been selected, the cluster ID in the alignment table corresponds to the cluster ID in the clonotype table. Otherwise, all elements in the “cluster_ID” column of the alignment table are assigned to zero.
In order to apply Alignment using the Select top N clonotypes option, Clonotypes computation should have run previously.
In order to apply Mutations, Alignment should have run previously, using the corresponding “AA or Nt” option. The Mutation table is computed based on the grouped alignment table.
In order to apply Mutations using the Select top N clonotypes or the Select clonotypes separately option, Clonotypes computation should have previously run.
In order to apply Logo using the Select top N clonotypes option, Clonotypes computation should have run previously.
Ιn order to run the Shared Clonotype computation and the Repertoire comparison steps, the user must have loaded more than one datasets.

For both T- and B-cells:

2.3.1 Clonotype computation

The frequencies for all unique clonotypes of each sample are computed. There are 10 different options for clonotype definition.

The results are presented in the Clonotypes tab in the form of a table, where the clonotype, the count, the frequency and the convergent evolution (if feasible) are given. Each clonotype is also a link that provides a table with all relevant immunogenetic data for that particular clonotype, based on the uploaded files. This table consists of all reads/sequences assigned to that clonotype and all relevant information. Each clonotype is given a unique cluster identifier (cluster ID).

2.3.2 Highly similar clonotypes computation

Frequencies for all highly similar clonotypes are computed. The user can set the number of mismatches allowed for each CDR3 length found in the dataset and a clonotype frequency threshold (range: 0-1). Only clonotypes with a frequency above the applied threshold will be used in the subsequent grouping. The whole process can be performed with or without taking into account the rearranged V-gene.

The results are presented in the Highly Similar Clonotypes tab as a table. A second table is also provided containing information regarding the clonotype grouping.

2.3.3 Repertoires extraction

The number of clonotypes using each V, J or D gene/allele is computed over the total number of clonotypes based on the clonotype definition given in the previous Clonotype computation step. If multiple samples are analyzed together the tool provides a total repertoire as well as the repertoire for each individual sample.

Results are provided in the Repertoires tab as tables. Each table includes the gene/allele and information concerning the absolute count and frequency of sequences expressing that particular gene/allele.

2.3.4 Highly Similar Repertoires extraction

Same as above except for the fact that the tool uses as input the clonotypes as computed in the Highly Similar Clonotypes computation.

2.3.5 Multiple value comparison

The tool performs cross-tabulation analysis between 2 selected variables. Many different variables can be selected by the user for this type of analysis depending on the selected input files from the Home tab.

The results are presented at the Multiple value comparison tab as tables. Each table contains the values that were found to be associated and the relevant frequency.

2.3.6 CDR3 with 1 amino acid length difference

This tool can be applied for datasets that consist of sequences with highly similar CDR3. The tool is able to align and create sequence logos for sequences with the same length as well as for sequences that differ by a single amino acid in terms of length.

2.3.7 Logo

This tool creates an amino acid frequency table for the selected sequence region (CDR3, VDJ REGION, VJ REGION) of a given length. The frequency table is computed by counting the frequency of appearance of each of the 20 different amino acids at any given position of the sequence. The users have the option to select over the total frequency table or the table of the top clusters according to the clonotype frequencies.

A logo is created using the above frequency table. The color code of the amino acids is created based on the 11 IMGT amino acid physicochemical classes.

Only for B cells:

2.3.8 Insert identity groups

Input sequences are grouped into different categories based on the V-region identity percent. The user can determine the number and the identity percent range of mutational groups. (high limit: <, low limit: ≥)

2.3.9 Somatic hypermutation status

The relative frequency of each germline identity group is computed. If the user has not defined any groups based on the somatic hypermutation (SHM) status using the Insert identity groups tool, the tool will group together only sequences that display the exact SHM status (e.g. sequences with an identity percent of 98.6% will be grouped together whereas sequences with 98.7% identity will form a distinct group). Relative frequencies for each SHM group will be computed based on the total number of sequences.

2.3.10 Alignment

An alignment table is created for the user-selected region (VDJ REGION, VJ REGION). Sequences that are identical in terms of amino acid or nucleotide sequence level are grouped together in order to create the grouped alignment table. Alignments for the selected region can be provided at the nucleotide or amino acid level or both. Default reference sequences are extracted from the IMGT reference directory. Reference sequences can be used either at the gene or gene allele level. At the gene level, allele *01 is considered as reference. Users can also submit their own reference sequence. There is also the possibility to align only a number of selected clonotypes through the Select topN clonotype option or select those clonotypes that have an individual frequency above a given percent cutoff.

Results are presented in the Alignment tab as tables.

Each table can be downloaded in txt format.

2.3.11 Somatic hypermutations

A table with all somatic hypermutations for all samples together as well as for each individual sample is computed based on the alignment table provided by the previous tool.

The output table includes:

the mutation type,
the position of the change,
the region where the change occurs,
the number of sequences carrying each change and
the frequency of the change for every gene or allele based on the grouped alignment table regardless the clonotype.

There is the possibility to analyze only a number of clonotypes by choosing the Select topN clonotypes or the Select threshold for clonotypes option or even some clonotypes separately by choosing the Select clonotypes separately option. Different clonotype/cluster identifiers (cluster IDs) should be separated by comma (e.g. 1,3,7).

Results are given in the Mutations tab as tables. When different clonotypes are selected separately, different tables are created for each given clonotype.

Each table can be downloaded in text format.

2.4 Visualization

In the Visualization tab different types of charts (scatter, plots, bars etc.) are available for the visualization of the analysis results. Clonotypes are presented as bars and the user can select the frequency above which the clonotypes will be presented.

The convergent evolution is also available for visualization with more than one chart type options.

The computed repertoires are presented as pie-charts and the user can again select the minimum frequency of the gene/allele that will be presented.

Regarding the Multiple value comparison tool, a plot of the 2 selected variables is presented.

All the tables that are presented to the user can be downloaded in text format, whereas the plots and the graphics can be downloaded in .png format.

2.5 Overview

This section provides an overview of the user’s total options for the analysis.

3 Running `tripr` via `R` command line

As mentioned before, tripr can also be used via R command line with the run_TRIP() function.

3.1 Usage

run_TRIP() works as a wrapper function for the analysis that tripr provides. To see its detailed documentation write:

    ?tripr::run_TRIP

Some of its most important arguments:

datapath : The path to the directory where data is located. Note that every sample of the dataset must have its own individual folder and every sample folder must be in one root folder. Note that every file in the root folder will be used in the analysis.

Supposedly the dataset is in user’s Documents folder, one could use: fs::path_home("Documents", "dataset"), with the help of fs package.

The default value is
```
    fs::path_package("extdata", "dataset", package = "tripr")
```
which uses the example dataset of 2 B-cell samples.
output_path : The directory where the output data will be stored. Please provide a valid path, ideally the same way as datapath by using the fs package.

The default value points to Documents/tripr_output directory.
filelist : The character vector of files of the IMGT/HighV-Quest tool output that will be used through the analysis.

The default value is
```
    c("1_Summary.txt", "2_IMGT-gapped-nt-sequences.txt", 
        "4_IMGT-gapped-AA-sequences.txt", "6_Junction.txt")
```
which uses only 4 of the 10 .txt files that the IMGT/HighV-Quest tool tool provides as output.
preselection : Preselection Options (1:4). See Preselection
selection : Selection Options (5:10). See Selection
pipeline : Pipeline Options (1:19). The user can select multiple pipelines by seperating them with comma ‘,’.

See Pipelines and run ?tripr::run_TRIP for more details.

3.2 Output of Command Line tool

Every output of tripr analysis with run_TRIP() function will be stored in the output_path directory as mentioned before. Therefore, no table or plot will be presented through RStudio or any other graphics device when the analysis is run, on contrary with the shiny app, where the user has access to output tables and plots via the User Interface.

Output Directory contains two folders:

output : Where data tables are stored.
Analysis : Where plots are stored.

The output directory has a unique name for every analysis, that points to the system time that it was run.

3.3 Example with `run_TRIP()`

An example of run_TRIP() analysis, using the example dataset of 2 B-cells that is provided, is presented below.

    datapath <- fs::path_package("extdata/dataset", package="tripr")
    output_path <- file.path(tempdir(), "myoutput")
    cell <- "Bcell"
    preselection <- "1,2,3,4C:W"
    selection <- "5"
    filelist <- c("1_Summary.txt", 
                  "2_IMGT-gapped-nt-sequences.txt", 
                  "4_IMGT-gapped-AA-sequences.txt", 
                  "6_Junction.txt")
    throughput <- "High Throughput"
    preselection <- "1,2,3,4C:W"
    selection <- "5"
    identity_range <- "88:100"
    pipeline <- "1"
    select_clonotype <- "V Gene + CDR3 Amino Acids"

    run_TRIP(
        datapath=datapath,
        output_path=output_path,
        filelist=filelist,
        cell=cell, 
        throughput=throughput, 
        preselection=preselection, 
        selection=selection, 
        identity_range=identity_range,
        pipeline=pipeline, 
        select_clonotype=select_clonotype)
#> png 
#>   2

4 Tool dependencies

The tripr package was made possible thanks to:

R (R Core Team, 2021)
BiocStyle (Oleś, 2021)
DT (Xie, Cheng, and Tan, 2021)
ggplot2 (Wickham, 2016)
golem (Fay, Guyader, Rochette, and Girard, 2021)
knitr (Xie, 2014)
plotly (Sievert, 2020)
RColorBrewer (Neuwirth, 2014)
rmarkdown (Allaire, Xie, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng, Chang, and Iannone, 2021)
shiny (Chang, Cheng, Allaire, Sievert, Schloerke, Xie, Allen, McPherson, Dipert, and Borges, 2021)
shinyBS (Bailey, 2015)
testthat (Wickham, 2011)
shinyjs (Attali, 2020)
shinyFiles (Pedersen, Nijs, Schaffner, and Nantz, 2020)
plyr (Wickham, 2011)
data.table (Dowle and Srinivasan, 2021)
stringr (Wickham, 2019)
stringdist (van der Loo, 2014)
plot3D (Soetaert, 2021)
gridExtra (Auguie, 2017)
dplyr (Wickham, François, Henry, and Müller, 2021)
pryr (Wickham, 2021)
fs (Hester and Wickham, 2020)
biocthis (Collado-Torres, 2021)

5 Citation

We hope that tripr will be useful for your research. Please use the following information to cite the package and the research article. Thank you!

## Citation info
citation("tripr")
#> 
#> To cite tripr in publications use:
#> 
#>   Kotouza, M.T., Gemenetzi, K., Galigalidou, C. et al. TRIP - T cell
#>   receptor/immunoglobulin profiler. BMC Bioinformatics 21, 422 (2020).
#>   https://doi.org/10.1186/s12859-020-03669-1
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {T-cell Receptor/Immunoglobulin Profiler (TRIP)},
#>     author = {Maria Th. Kotouza and Katerina Gemenetzi and Chrysi Galigalidou and Elisavet Vlachonikola and Nikolaos Pechlivanis and Andreas Agathangelidis and Raphael Sandaltzopoulos and Pericles A. Mitkas and Kostas Stamatopoulos and Anastasia Chatzidimitriou and Fotis E. Psomopoulos},
#>     journal = {BMC Bioinformatics},
#>     year = {2020},
#>     volume = {21},
#>     number = {422},
#>     pages = {-},
#>     url = {https://doi.org/10.1186/s12859-020-03669-1},
#>   }

Session info

Here is the output of sessionInfo() on the system on which this document was compiled running pandoc 2.5:

#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] tripr_1.0.0      shinyBS_0.61     shiny_1.7.1      RefManageR_1.3.0
#> [5] BiocStyle_2.22.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] httr_1.4.2          shinyFiles_0.9.0    tidyr_1.1.4        
#>  [4] sass_0.4.0          pkgload_1.2.3       viridisLite_0.4.0  
#>  [7] jsonlite_1.7.2      bslib_0.3.1         assertthat_0.2.1   
#> [10] BiocManager_1.30.16 highr_0.9           yaml_2.2.1         
#> [13] remotes_2.4.1       pillar_1.6.4        glue_1.4.2         
#> [16] digest_0.6.28       pryr_0.1.5          RColorBrewer_1.1-2 
#> [19] promises_1.2.0.1    colorspace_2.0-2    htmltools_0.5.2    
#> [22] httpuv_1.6.3        plyr_1.8.6          pkgconfig_2.0.3    
#> [25] misc3d_0.9-1        bookdown_0.24       config_0.3.1       
#> [28] purrr_0.3.4         xtable_1.8-4        scales_1.1.1       
#> [31] processx_3.5.2      later_1.3.0         tibble_3.1.5       
#> [34] generics_0.1.1      ggplot2_3.3.5       usethis_2.1.2      
#> [37] ellipsis_0.3.2      shinyjs_2.0.0       withr_2.4.2        
#> [40] lazyeval_0.2.2      cli_3.0.1           magrittr_2.0.1     
#> [43] crayon_1.4.1        mime_0.12           evaluate_0.14      
#> [46] ps_1.6.0            golem_0.3.1         fs_1.5.0           
#> [49] dockerfiler_0.1.4   fansi_0.5.0         xml2_1.3.2         
#> [52] pkgbuild_1.2.0      tools_4.1.1         data.table_1.14.2  
#> [55] prettyunits_1.1.1   lifecycle_1.0.1     stringr_1.4.0      
#> [58] plotly_4.10.0       munsell_0.5.0       callr_3.7.0        
#> [61] compiler_4.1.1      jquerylib_0.1.4     rlang_0.4.12       
#> [64] plot3D_1.4          grid_4.1.1          attempt_0.3.1      
#> [67] rstudioapi_0.13     htmlwidgets_1.5.4   tcltk_4.1.1        
#> [70] rmarkdown_2.11      testthat_3.1.0      codetools_0.2-18   
#> [73] gtable_0.3.0        DBI_1.1.1           roxygen2_7.1.2     
#> [76] R6_2.5.1            gridExtra_2.3       lubridate_1.8.0    
#> [79] knitr_1.36          dplyr_1.0.7         fastmap_1.1.0      
#> [82] utf8_1.2.2          rprojroot_2.0.2     desc_1.4.0         
#> [85] stringi_1.7.5       parallel_4.1.1      Rcpp_1.0.7         
#> [88] vctrs_0.3.8         tidyselect_1.1.1    xfun_0.27

Bibliography

This vignette was generated using BiocStyle (Oleś, 2021), knitr (Xie, 2014) and rmarkdown (Allaire, Xie, McPherson, et al., 2021) running behind the scenes.

Citations made with RefManageR (McLean, 2017).

[1] J. Allaire, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 2.11. 2021. URL: https://github.com/rstudio/rmarkdown.

[2] D. Attali. shinyjs: Easily Improve the User Experience of Your Shiny Apps in Seconds. R package version 2.0.0. 2020. URL: https://CRAN.R-project.org/package=shinyjs.

[3] B. Auguie. gridExtra: Miscellaneous Functions for “Grid” Graphics. R package version 2.3. 2017. URL: https://CRAN.R-project.org/package=gridExtra.

[4] E. Bailey. shinyBS: Twitter Bootstrap Components for Shiny. R package version 0.61. 2015. URL: https://CRAN.R-project.org/package=shinyBS.

[5] W. Chang, J. Cheng, J. Allaire, et al. shiny: Web Application Framework for R. R package version 1.7.1. 2021. URL: https://CRAN.R-project.org/package=shiny.

[6] L. Collado-Torres. Automate package and project setup for Bioconductor packages. https://github.com/lcolladotor/biocthisbiocthis - R package version 1.4.0. 2021. DOI: 10.18129/B9.bioc.biocthis. URL: http://www.bioconductor.org/packages/biocthis.

[7] M. Dowle and A. Srinivasan. data.table: Extension of ‘data.frame’. R package version 1.14.2. 2021. URL: https://CRAN.R-project.org/package=data.table.

[8] C. Fay, V. Guyader, S. Rochette, et al. golem: A Framework for Robust Shiny Applications. R package version 0.3.1. 2021. URL: https://CRAN.R-project.org/package=golem.

[9] J. Hester and H. Wickham. fs: Cross-Platform File System Operations Based on ‘libuv’. R package version 1.5.0. 2020. URL: https://CRAN.R-project.org/package=fs.

[10] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.

[11] E. Neuwirth. RColorBrewer: ColorBrewer Palettes. R package version 1.1-2. 2014. URL: https://CRAN.R-project.org/package=RColorBrewer.

[12] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.22.0. 2021. URL: https://github.com/Bioconductor/BiocStyle.

[13] T. Pedersen, V. Nijs, T. Schaffner, et al. shinyFiles: A Server-Side File System Viewer for Shiny. R package version 0.9.0. 2020. URL: https://CRAN.R-project.org/package=shinyFiles.

[14] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2021. URL: https://www.R-project.org/.

[15] C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC, 2020. ISBN: 9781138331457. URL: https://plotly-r.com.

[16] K. Soetaert. plot3D: Plotting Multi-Dimensional Data. R package version 1.4. 2021. URL: https://CRAN.R-project.org/package=plot3D.

[17] H. Wickham. “The Split-Apply-Combine Strategy for Data Analysis”. In: Journal of Statistical Software 40.1 (2011), pp. 1–29. URL: http://www.jstatsoft.org/v40/i01/.

[18] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ISBN: 978-3-319-24277-4. URL: https://ggplot2.tidyverse.org.

[19] H. Wickham. pryr: Tools for Computing on the Language. R package version 0.1.5. 2021. URL: https://CRAN.R-project.org/package=pryr.

[20] H. Wickham. stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0. 2019. URL: https://CRAN.R-project.org/package=stringr.

[21] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.

[22] H. Wickham, R. François, L. Henry, et al. dplyr: A Grammar of Data Manipulation. R package version 1.0.7. 2021. URL: https://CRAN.R-project.org/package=dplyr.

[23] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.

[24] Y. Xie, J. Cheng, and X. Tan. DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.19. 2021. URL: https://CRAN.R-project.org/package=DT.

[25] M. van der Loo. “The stringdist package for approximate string matching”. In: The R Journal 6 (1 2014), pp. 111-122. URL: https://CRAN.R-project.org/package=stringdist.