Find subjects using a file as search input
I have been working with data from IDC for a cohort of 100 individuals, and I'd like to see if any other data is available about my subjects, and where. I want to submit a file that has all of my individuals IDs so I can search them all at once. My file looks like this:
So, I want to match the subject
column in mydatafile.tsv
to the cda column for subject_id
. And since I want to see where data exists about my subjects, I'm adding provenance = True
, which will make the data return with one row per subject, per data center:
fetch_rows(table = 'subject', provenance= True, match_from_file = {'input_column': 'subject', 'input_file': 'mydatafile.tsv', 'cda_column_to_match':'subject_id'})
subject_id | cause_of_death | days_to_birth | days_to_death | ethnicity | race | sex | species | vital_status | subject_data_source | subject_data_source_id |
---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.2.5 from the init_notebook_mode cell...
(need help?) |
My 100 subjects have over 400 rows of data, and it looks like they have data spread across CDS, GDC, PDC, and IDC. Using the subject_data_source_id, I can go look at each of these at those data centers, or I can run more CDA searches to get a better idea what is shared. Let's see a summary of what files there are:
summary_counts(table = 'file', match_from_file = {'input_column': 'subject', 'input_file': 'mydatafile.tsv', 'cda_column_to_match':'subject_id'})
╔══════════════════════╗ ║ total_file_matches ║ ╠══════════════════════╣ ║ 22326 ║ ╚══════════════════════╝ ╔═════════╦══════════════════════════════╗ ║ count ║ data_category ║ ╠═════════╬══════════════════════════════╣ ║ 12487 ║ Imaging ║ ║ 2464 ║ Simple Nucleotide Variation ║ ║ 1348 ║ Peptide Spectral Matches ║ ║ 1330 ║ Sequencing Reads ║ ║ 1033 ║ Copy Number Variation ║ ║ 674 ║ Raw Mass Spectra ║ ║ 674 ║ Processed Mass Spectra ║ ║ 448 ║ Structural Variation ║ ║ 398 ║ Biospecimen ║ ║ 386 ║ Transcriptome Profiling ║ ║ 286 ║ Somatic Structural Variation ║ ║ 285 ║ DNA Methylation ║ ║ 274 ║ <NA> ║ ║ 118 ║ RNA-Seq ║ ║ 66 ║ Proteome Profiling ║ ║ 52 ║ WXS ║ ║ 2 ║ WGS ║ ║ 1 ║ miRNA-Seq ║ ╚═════════╩══════════════════════════════╝ ╔═════════╦══════════════════════════════════════════════╗ ║ count ║ data_type ║ ╠═════════╬══════════════════════════════════════════════╣ ║ 10078 ║ MR Image Storage ║ ║ 1872 ║ VL Whole Slide Microscopy Image Storage ║ ║ 1348 ║ Open Standard ║ ║ 1006 ║ Somatic Mutation Index ║ ║ 802 ║ Annotated Somatic Mutation ║ ║ 715 ║ Aligned Reads ║ ║ 674 ║ Text ║ ║ 674 ║ Proprietary ║ ║ 615 ║ Aligned Reads Index ║ ║ 450 ║ Raw Simple Somatic Mutation ║ ║ 395 ║ Segmentation Storage ║ ║ 392 ║ Transcript Fusion ║ ║ 266 ║ <NA> ║ ║ 242 ║ Gene Level Copy Number ║ ║ 235 ║ Copy Number Segment ║ ║ 228 ║ Structural Rearrangement ║ ║ 224 ║ Slide Image ║ ║ 190 ║ Masked Intensities ║ ║ 174 ║ Biospecimen Supplement ║ ║ 166 ║ Raw Intensities ║ ║ 166 ║ Simple Germline Variation ║ ║ 163 ║ Masked Copy Number Segment ║ ║ 159 ║ Allele-specific Copy Number Segment ║ ║ 98 ║ Gene Expression Quantification ║ ║ 98 ║ Splice Junction Quantification ║ ║ 98 ║ Clinical Supplement ║ ║ 95 ║ Methylation Beta Value ║ ║ 95 ║ Isoform Expression Quantification ║ ║ 95 ║ miRNA Expression Quantification ║ ║ 83 ║ Pathology Report ║ ║ 80 ║ Microscopy Bulk Simple Annotations Storage ║ ║ 77 ║ Aggregated Somatic Mutation ║ ║ 77 ║ Masked Somatic Mutation ║ ║ 68 ║ Intermediate Analysis Archive ║ ║ 66 ║ Protein Expression Quantification ║ ║ 48 ║ Advanced Blending Presentation State Storage ║ ║ 11 ║ Comprehensive SR Storage ║ ║ 3 ║ Secondary Capture Image Storage ║ ╚═════════╩══════════════════════════════════════════════╝ ╔═════════╦════════════════════╗ ║ count ║ file_data_source ║ ╠═════════╬════════════════════╣ ║ 12487 ║ IDC ║ ║ 6877 ║ GDC ║ ║ 2696 ║ PDC ║ ║ 266 ║ CDS ║ ╚═════════╩════════════════════╝ ╔═════════╦═══════════════╗ ║ count ║ file_format ║ ╠═════════╬═══════════════╣ ║ 12487 ║ DICOM ║ ║ 1562 ║ TSV ║ ║ 1011 ║ VCF ║ ║ 1006 ║ TBI ║ ║ 840 ║ TXT ║ ║ 746 ║ BAM ║ ║ 674 ║ <NA> ║ ║ 674 ║ mzIdentML ║ ║ 674 ║ mzML ║ ║ 615 ║ BAI ║ ║ 523 ║ MAF ║ ║ 310 ║ BEDPE ║ ║ 240 ║ SVS ║ ║ 190 ║ IDAT ║ ║ 166 ║ CEL ║ ║ 164 ║ BCR XML ║ ║ 144 ║ FASTQ ║ ║ 83 ║ PDF ║ ║ 82 ║ BCR SSF XML ║ ║ 68 ║ TAR ║ ║ 32 ║ TIFF ║ ║ 19 ║ BCR Biotab ║ ║ 8 ║ GCT ║ ║ 7 ║ BCR OMF XML ║ ║ 1 ║ CSV ║ ╚═════════╩═══════════════╝