Rfastp 1.17.0
The Rfastp package provides an interface to the all-in-one preprocessing for FastQ files toolkit fastp(Chen et al. 2018).
Use the BiocManager
package to download and install the package from
Bioconductor as follows:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Rfastp")
If required, the latest development version of the package can also be installed from GitHub.
BiocManager::install("remotes")
BiocManager::install("RockefellerUniversity/Rfastp")
Once the package is installed, load it into your R session:
library(Rfastp)
The package contains three example fastq files, corresponding to a single-end fastq file, a pair of paired-end fastq files.
se_read1 <- system.file("extdata","Fox3_Std_small.fq.gz",package="Rfastp")
pe_read1 <- system.file("extdata","reads1.fastq.gz",package="Rfastp")
pe_read2 <- system.file("extdata","reads2.fastq.gz",package="Rfastp")
outputPrefix <- tempfile(tmpdir = tempdir())
Rfastp support multiple threads, set threads number by parameter thread
.
se_json_report <- rfastp(read1 = se_read1,
outputFastq = paste0(outputPrefix, "_se"), thread = 4)
pe_json_report <- rfastp(read1 = pe_read1, read2 = pe_read2,
outputFastq = paste0(outputPrefix, "_pe"))
pe_merge_json_report <- rfastp(read1 = pe_read1, read2 = pe_read2, merge = TRUE,
outputFastq = paste0(outputPrefix, '_unpaired'),
mergeOut = paste0(outputPrefix, "_merged.fastq.gz"))
umi_json_report <- rfastp(read1 = pe_read1, read2 = pe_read2,
outputFastq = paste0(outputPrefix, '_umi1'), umi = TRUE, umiLoc = "read1",
umiLength = 16)
the following example will add prefix string before the UMI sequence in the sequence name. An “_” will be added between the prefix string and UMI sequence. The UMI sequences will be inserted into the sequence name before the first space.
umi_json_report <- rfastp(read1 = pe_read1, read2 = pe_read2,
outputFastq = paste0(outputPrefix, '_umi2'), umi = TRUE, umiLoc = "read1",
umiLength = 16, umiPrefix = "#", umiNoConnection = TRUE,
umiIgnoreSeqNameSpace = TRUE)
Trim poor quality bases at 3’ end base by base with quality higher than 5; trim poor quality bases at 5’ end by a 29bp window with mean quality higher than 20; disable the polyG trimming, specify the adapter sequence for read1.
clipr_json_report <- rfastp(read1 = se_read1,
outputFastq = paste0(outputPrefix, '_clipr'),
disableTrimPolyG = TRUE,
cutLowQualFront = TRUE,
cutFrontWindowSize = 29,
cutFrontMeanQual = 20,
cutLowQualTail = TRUE,
cutTailWindowSize = 1,
cutTailMeanQual = 5,
minReadLength = 29,
adapterSequenceRead1 = 'GTGTCAGTCACTTCCAGCGG'
)
rfastq can accept multiple input files, and it will concatenate the input files into one and the run fastp.
pe001_read1 <- system.file("extdata","splited_001_R1.fastq.gz",
package="Rfastp")
pe002_read1 <- system.file("extdata","splited_002_R1.fastq.gz",
package="Rfastp")
pe003_read1 <- system.file("extdata","splited_003_R1.fastq.gz",
package="Rfastp")
pe004_read1 <- system.file("extdata","splited_004_R1.fastq.gz",
package="Rfastp")
inputfiles <- c(pe001_read1, pe002_read1, pe003_read1, pe004_read1)
cat_rjson_report <- rfastp(read1 = inputfiles,
outputFastq = paste0(outputPrefix, "_merged1"))
pe001_read2 <- system.file("extdata","splited_001_R2.fastq.gz",
package="Rfastp")
pe002_read2 <- system.file("extdata","splited_002_R2.fastq.gz",
package="Rfastp")
pe003_read2 <- system.file("extdata","splited_003_R2.fastq.gz",
package="Rfastp")
pe004_read2 <- system.file("extdata","splited_004_R2.fastq.gz",
package="Rfastp")
inputR2files <- c(pe001_read2, pe002_read2, pe003_read2, pe004_read2)
catfastq(output = paste0(outputPrefix,"_merged2_R2.fastq.gz"),
inputFiles = inputR2files)
dfsummary <- qcSummary(pe_json_report)
p1 <- curvePlot(se_json_report)
p1
p2 <- curvePlot(se_json_report, curve="content_curves")
p2
dfTrim <- trimSummary(pe_json_report)
usage of rfastp:
?rfastp
usage of catfastq:
?catfastq
usage of qcSummary:
?qcSummary
usage of trimSummary:
?trimSummary
usage of curvePlot:
?curvePlot
Thank you to Ji-Dung Luo for testing/vignette review/critical feedback, Doug Barrows for critical feedback/vignette review and Ziwei Liang for their support. # Session info
sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_GB
## [4] LC_COLLATE=C LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Rfastp_1.17.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 jsonlite_1.8.9 highr_0.11 rjson_0.2.23
## [5] dplyr_1.1.4 compiler_4.5.0 BiocManager_1.30.25 tinytex_0.53
## [9] tidyselect_1.2.1 Rcpp_1.0.13 stringr_1.5.1 magick_2.8.5
## [13] jquerylib_0.1.4 scales_1.3.0 yaml_2.3.10 fastmap_1.2.0
## [17] ggplot2_3.5.1 R6_2.5.1 plyr_1.8.9 labeling_0.4.3
## [21] generics_0.1.3 knitr_1.48 tibble_3.2.1 bookdown_0.41
## [25] munsell_0.5.1 bslib_0.8.0 pillar_1.9.0 rlang_1.1.4
## [29] utf8_1.2.4 cachem_1.1.0 stringi_1.8.4 xfun_0.48
## [33] sass_0.4.9 cli_3.6.3 withr_3.0.2 magrittr_2.0.3
## [37] digest_0.6.37 grid_4.5.0 lifecycle_1.0.4 vctrs_0.6.5
## [41] evaluate_1.0.1 glue_1.8.0 farver_2.1.2 fansi_1.0.6
## [45] colorspace_2.1-1 reshape2_1.4.4 rmarkdown_2.28 tools_4.5.0
## [49] pkgconfig_2.0.3 htmltools_0.5.8.1