1 How could I generate a manifest file with filtering of Race and Ethnicity?

From https://support.bioconductor.org/p/9138939/.

library(GenomicDataCommons,quietly = TRUE)

I made a small change to the filtering expression approach based on changes to lazy evaluation best practices. There is now no need to include the ~ in the filter expression. So:

q = files() %>%
  GenomicDataCommons::filter(
    cases.project.project_id == 'TCGA-COAD' &
      data_type == 'Aligned Reads' &
      experimental_strategy == 'RNA-Seq' &
      data_format == 'BAM')

And get a count of the results:

count(q)

## [1] 1183

And the manifest.

manifest(q)

ABCDEFGHIJ0123456789

	id <chr>
1	6ec20f1d-4ce9-46d2-be21-6a991949b181
2	2296a4aa-06fb-4670-b753-b491fa2b5e24
3	971a0df1-0bc9-40fb-ad16-67202ffc8c77
4	970013f9-fa15-4b4a-b439-2983abe5d4ed
5	d582a5e8-a358-480f-9c58-86159a656da7
6	78765bfe-0f31-431e-b6eb-b5ce210e84df
7	7c887fb8-1133-4fda-a4a0-e1c735fce80e
8	9101eabf-2060-4881-95a3-105a4758d4d3
9	1701a5f9-9e25-4968-9f98-291cf6a03e59
10	5e9a188c-008f-4736-8eb2-a8d1525bb47a

Your question about race and ethnicity is a good one.

all_fields = available_fields(files())

And we can grep for race or ethnic to get potential matching fields to look at.

grep('race|ethnic',all_fields,value=TRUE)

## [1] "cases.demographic.ethnicity"                 
## [2] "cases.demographic.race"                      
## [3] "cases.follow_ups.hormonal_contraceptive_type"
## [4] "cases.follow_ups.hormonal_contraceptive_use" 
## [5] "cases.follow_ups.scan_tracer_used"

Now, we can check available values for each field to determine how to complete our filter expressions.

available_values('files',"cases.demographic.ethnicity")

## [1] "not hispanic or latino" "not reported"           "hispanic or latino"    
## [4] "unknown"                "not allowed to collect" "_missing"

available_values('files',"cases.demographic.race")

##  [1] "white"                                    
##  [2] "not reported"                             
##  [3] "black or african american"                
##  [4] "asian"                                    
##  [5] "unknown"                                  
##  [6] "other"                                    
##  [7] "not allowed to collect"                   
##  [8] "american indian or alaska native"         
##  [9] "native hawaiian or other pacific islander"
## [10] "_missing"

We can complete our filter expression now to limit to white race only.

q_white_only = q %>%
  GenomicDataCommons::filter(cases.demographic.race=='white')
count(q_white_only)

## [1] 691

manifest(q_white_only)

ABCDEFGHIJ0123456789

	id <chr>
1	6ec20f1d-4ce9-46d2-be21-6a991949b181
2	2296a4aa-06fb-4670-b753-b491fa2b5e24
3	971a0df1-0bc9-40fb-ad16-67202ffc8c77
4	970013f9-fa15-4b4a-b439-2983abe5d4ed
5	9101eabf-2060-4881-95a3-105a4758d4d3
6	d0bcfdbe-512d-4bec-a212-6da156e49562
7	5a9403cd-27b8-4421-92cd-ea1bb4a21ec3
8	5bc05b98-c981-40ca-b337-3e38d7e0f3a9
9	702c13ed-c5d2-4f29-a45d-dffe97719567
10	0741acb2-1bd0-4644-aaf9-cf4c141e405f

2 How can I get the number of cases with RNA-Seq data added by date to TCGA project with `GenomicDataCommons`?

From https://support.bioconductor.org/p/9135791/

I would like to get the number of cases added (created, any logical datetime would suffice here) to the TCGA project by experiment type. I attempted to get this data via GenomicDataCommons package, but it is giving me I believe the number of files for a given experiment type rather than number cases. How can I get the number of cases for which there is RNA-Seq data?

library(tibble)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:GenomicDataCommons':
## 
##     count, filter, select

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(GenomicDataCommons)

cases() %>% 
  GenomicDataCommons::filter(~ project.program.name=='TCGA' & 
                               files.experimental_strategy=='RNA-Seq') %>% 
  facet(c("files.created_datetime")) %>% 
  aggregations() %>% 
  .[[1]] %>% 
  as_tibble() %>%
  dplyr::arrange(dplyr::desc(key))

ABCDEFGHIJ0123456789

doc_count <int>	key <chr>
362	2021-04-05t12:48:23.926301-05:00
438	2021-04-05t08:30:00.775501-05:00
374	2021-04-05t08:29:15.674486-05:00
305	2021-04-05t08:26:08.920845-05:00
349	2021-04-05t08:22:48.913195-05:00
351	2021-04-05t08:21:59.240799-05:00
269	2021-04-05t08:21:27.725962-05:00
325	2021-04-05t08:20:56.849817-05:00
428	2021-04-05t08:20:25.746896-05:00
214	2021-04-05t08:19:48.312949-05:00

Questions and answers from over the years

Sunday, October 09, 2022

1 How could I generate a manifest file with filtering of Race and Ethnicity?

2 How can I get the number of cases with RNA-Seq data added by date to TCGA project with `GenomicDataCommons`?

Questions and answers from over the years

Sunday, October 09, 2022

1 How could I generate a manifest file with filtering of Race and Ethnicity?

2 How can I get the number of cases with RNA-Seq data added by date to TCGA project with GenomicDataCommons?

2 How can I get the number of cases with RNA-Seq data added by date to TCGA project with `GenomicDataCommons`?