Search the entire CDA with keywords¶
I'm a grad student, and I don't have a lot of money available to run new experiments. I want to see what data is available that I can use for the bulk of my research to minimize costs. I'm interested in kidney cancer, but haven't locked in a topic yet, so I'd just like to explore.
Since I don't really know what I'm looking for, I want to just put in some key words an see what data pops up. I'm going to start with summaries, because those are easy to browse.
summarize_subjects( 'kidney' )
╔═══════════════════════════════╗ ║ number_of_matching_subjects ║ ╠═══════════════════════════════╣ ║ 3499 ║ ╚═══════════════════════════════╝ ╔════════════════════════════════════════════════╗ ║ number_of_files_related_to_matching_subjects ║ ╠════════════════════════════════════════════════╣ ║ 171777 ║ ╚════════════════════════════════════════════════╝ ╔════════════╦══════════════════════╗ ║ subjects ║ data_source ║ ╠════════════╬══════════════════════╣ ║ 1285 ║ GDC only ║ ║ 966 ║ GDC + IDC ║ ║ 791 ║ IDC only ║ ║ 121 ║ GDC + IDC + PDC ║ ║ 112 ║ GDC + IDC + PDC + GC ║ ║ 110 ║ GC only ║ ║ 44 ║ GDC + IDC + GC ║ ║ 29 ║ GDC + PDC ║ ║ 25 ║ PDC only ║ ║ 11 ║ IDC + GC ║ ║ 3 ║ IDC + ICDC ║ ║ 1 ║ ICDC only ║ ║ 1 ║ IDC + PDC ║ ╚════════════╩══════════════════════╝ ╔════════════════╦═══════════════════════════════════════════╗ ║ count_result ║ race ║ ╠════════════════╬═══════════════════════════════════════════╣ ║ 1715 ║ White ║ ║ 1447 ║ <NA> ║ ║ 290 ║ Black or African American ║ ║ 44 ║ Asian ║ ║ 2 ║ American Indian or Alaska Native ║ ║ 1 ║ Native Hawaiian or Other Pacific Islander ║ ╚════════════════╩═══════════════════════════════════════════╝ ╔════════════════╦═══════════╗ ║ count_result ║ species ║ ╠════════════════╬═══════════╣ ║ 2691 ║ human ║ ║ 804 ║ <NA> ║ ║ 4 ║ dog ║ ╚════════════════╩═══════════╝ ╔════════════════╦════════════════════╗ ║ count_result ║ ethnicity ║ ╠════════════════╬════════════════════╣ ║ 1926 ║ <NA> ║ ║ 1438 ║ Non-Hispanic ║ ║ 135 ║ Hispanic or Latino ║ ╚════════════════╩════════════════════╝ ╔════════════════╦══════════════════════════╗ ║ count_result ║ cause_of_death ║ ╠════════════════╬══════════════════════════╣ ║ 3320 ║ <NA> ║ ║ 158 ║ Cancer-Related Death ║ ║ 7 ║ Infection ║ ║ 7 ║ Toxicity ║ ║ 5 ║ Non-Cancer Related Death ║ ║ 1 ║ Cardiovascular Disorder ║ ║ 1 ║ Surgical Complication ║ ╚════════════════╩══════════════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_death ║ ╠════════════════╬═════════════════╣ ║ mean ║ 2017 ║ ║ min ║ 2010 ║ ║ lower quartile ║ 2016 ║ ║ median ║ 2018 ║ ║ upper quartile ║ 2020 ║ ║ max ║ 2021 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_birth ║ ╠════════════════╬═════════════════╣ ║ mean ║ 1964 ║ ║ min ║ 1926 ║ ║ lower quartile ║ 1949 ║ ║ median ║ 1958 ║ ║ upper quartile ║ 1970 ║ ║ max ║ 2017 ║ ╚════════════════╩═════════════════╝
There are about 3500 subjects that have clinical data or file data that is tagged with kidney. It has searched every column, but the summary results don't display every column. I'm goint to run the columns() command to see what columns are available, and add interesting sounding ones to my search:
columns()
Loading ITables v2.7.3 from the init_notebook_mode cell...
(need help?)
|
summarize_subjects( 'kidney', add_columns=['anatomic_site', 'observed_anatomic_site', 'resection_anatomic_site', 'diagnosis', 'morphology', 'format'])
╔═══════════════════════════════╗ ║ number_of_matching_subjects ║ ╠═══════════════════════════════╣ ║ 3499 ║ ╚═══════════════════════════════╝ ╔════════════════════════════════════════════════╗ ║ number_of_files_related_to_matching_subjects ║ ╠════════════════════════════════════════════════╣ ║ 171777 ║ ╚════════════════════════════════════════════════╝ ╔════════════╦══════════════════════╗ ║ subjects ║ data_source ║ ╠════════════╬══════════════════════╣ ║ 1285 ║ GDC only ║ ║ 966 ║ GDC + IDC ║ ║ 791 ║ IDC only ║ ║ 121 ║ GDC + PDC + IDC ║ ║ 112 ║ GDC + PDC + GC + IDC ║ ║ 110 ║ GC only ║ ║ 44 ║ GDC + GC + IDC ║ ║ 29 ║ GDC + PDC ║ ║ 25 ║ PDC only ║ ║ 11 ║ GC + IDC ║ ║ 3 ║ ICDC + IDC ║ ║ 1 ║ ICDC only ║ ║ 1 ║ PDC + IDC ║ ╚════════════╩══════════════════════╝ ╔════════════════╦══════════════════════════╗ ║ count_result ║ cause_of_death ║ ╠════════════════╬══════════════════════════╣ ║ 3320 ║ <NA> ║ ║ 158 ║ Cancer-Related Death ║ ║ 7 ║ Infection ║ ║ 7 ║ Toxicity ║ ║ 5 ║ Non-Cancer Related Death ║ ║ 1 ║ Cardiovascular Disorder ║ ║ 1 ║ Surgical Complication ║ ╚════════════════╩══════════════════════════╝ ╔════════════════╦═══════════════════════════════════════════╗ ║ count_result ║ race ║ ╠════════════════╬═══════════════════════════════════════════╣ ║ 1715 ║ White ║ ║ 1447 ║ <NA> ║ ║ 290 ║ Black or African American ║ ║ 44 ║ Asian ║ ║ 2 ║ American Indian or Alaska Native ║ ║ 1 ║ Native Hawaiian or Other Pacific Islander ║ ╚════════════════╩═══════════════════════════════════════════╝ ╔════════════════╦════════════════════╗ ║ count_result ║ ethnicity ║ ╠════════════════╬════════════════════╣ ║ 1926 ║ <NA> ║ ║ 1438 ║ Non-Hispanic ║ ║ 135 ║ Hispanic or Latino ║ ╚════════════════╩════════════════════╝ ╔════════════════╦═══════════╗ ║ count_result ║ species ║ ╠════════════════╬═══════════╣ ║ 2691 ║ human ║ ║ 804 ║ <NA> ║ ║ 4 ║ dog ║ ╚════════════════╩═══════════╝ ╔════════════════╦════════════════════════════╗ ║ count_result ║ anatomic_site ║ ╠════════════════╬════════════════════════════╣ ║ 99418 ║ <NA> ║ ║ 47411 ║ kidney ║ ║ 18616 ║ blood ║ ║ 3717 ║ lung ║ ║ 2312 ║ abdomen ║ ║ 2123 ║ chest ║ ║ 1587 ║ pancreas ║ ║ 1565 ║ stomach ║ ║ 1020 ║ pyloric antrum ║ ║ 816 ║ pylorus ║ ║ 703 ║ liver ║ ║ 608 ║ tongue ║ ║ 558 ║ breast ║ ║ 549 ║ cortex of kidney ║ ║ 547 ║ hindlimb skin ║ ║ 544 ║ skeletal muscle tissue ║ ║ 543 ║ adrenal gland ║ ║ 540 ║ adipose tissue ║ ║ 540 ║ tibial artery ║ ║ 540 ║ tibial nerve ║ ║ 539 ║ esophagus mucosa ║ ║ 539 ║ esophagus muscle ║ ║ 531 ║ transverse colon ║ ║ 528 ║ spleen ║ ║ 524 ║ aorta ║ ║ 519 ║ thyroid gland ║ ║ 510 ║ heart ║ ║ 474 ║ omentum ║ ║ 473 ║ suprapubic skin ║ ║ 468 ║ ilium ║ ║ 465 ║ sigmoid colon ║ ║ 465 ║ uterus ║ ║ 460 ║ esophagogastric junction ║ ║ 441 ║ coronary artery ║ ║ 429 ║ atrium auricular region ║ ║ 408 ║ fundus of stomach ║ ║ 406 ║ cerebral cortex ║ ║ 405 ║ pituitary gland ║ ║ 404 ║ cerebellum ║ ║ 371 ║ prostate gland ║ ║ 370 ║ testis ║ ║ 224 ║ right kidney ║ ║ 204 ║ cardia of stomach ║ ║ 172 ║ urinary bladder ║ ║ 160 ║ ovary ║ ║ 148 ║ left kidney ║ ║ 143 ║ vagina ║ ║ 108 ║ minor salivary gland ║ ║ 96 ║ upper urinary tract ║ ║ 61 ║ retroperitoneal lymph node ║ ║ 58 ║ buccal mucosa ║ ║ 53 ║ trunk ║ ║ 52 ║ abdominopelvic cavity ║ ║ 50 ║ left adrenal gland ║ ║ 49 ║ renal medulla ║ ║ 42 ║ skin of body ║ ║ 41 ║ brain ║ ║ 37 ║ vein ║ ║ 30 ║ right adrenal gland ║ ║ 27 ║ colon ║ ║ 27 ║ head ║ ║ 20 ║ right lung ║ ║ 18 ║ adrenal cortex ║ ║ 18 ║ mediastinal lymph node ║ ║ 16 ║ fallopian tube ║ ║ 16 ║ paraaortic lymph node ║ ║ 15 ║ ectocervix ║ ║ 15 ║ endocervix ║ ║ 14 ║ craniocervical region ║ ║ 12 ║ lymph node ║ ║ 12 ║ peritoneum ║ ║ 10 ║ inferior vena cava ║ ║ 8 ║ left lung ║ ║ 6 ║ hepatic lymph node ║ ║ 6 ║ mesothelium ║ ║ 6 ║ pelvic region of trunk ║ ║ 5 ║ thymus ║ ║ 4 ║ abdominal wall ║ ║ 4 ║ appendage ║ ║ 4 ║ bone tissue ║ ║ 4 ║ humerus ║ ║ 4 ║ left renal vein ║ ║ 4 ║ paratracheal lymph node ║ ║ 3 ║ renal pelvis ║ ║ 2 ║ abdominal lymph node ║ ║ 2 ║ axillary lymph node ║ ║ 2 ║ bile duct ║ ╚════════════════╩════════════════════════════╝ ╔════════════════╦═══════════════════╗ ║ count_result ║ format ║ ╠════════════════╬═══════════════════╣ ║ 29451 ║ DICOM ║ ║ 21729 ║ VCF ║ ║ 21465 ║ TBI ║ ║ 16932 ║ TSV ║ ║ 14351 ║ BAM ║ ║ 12740 ║ TXT ║ ║ 12368 ║ BAI ║ ║ 10210 ║ MAF ║ ║ 6627 ║ BEDPE ║ ║ 4284 ║ IDAT ║ ║ 4185 ║ <NA> ║ ║ 3495 ║ SVS ║ ║ 2416 ║ mzML ║ ║ 2046 ║ CEL ║ ║ 2013 ║ BCR XML ║ ║ 1901 ║ mzIdentML ║ ║ 1371 ║ TAR ║ ║ 1029 ║ PDF ║ ║ 1000 ║ BCR SSF XML ║ ║ 945 ║ FASTQ ║ ║ 387 ║ BCR Biotab ║ ║ 219 ║ CRAI ║ ║ 219 ║ CRAM ║ ║ 196 ║ BCR OMF XML ║ ║ 44 ║ JSON ║ ║ 39 ║ MEX ║ ║ 24 ║ XLSX ║ ║ 23 ║ CDC JSON ║ ║ 19 ║ HDF5 ║ ║ 14 ║ GCT ║ ║ 10 ║ BCR Auxiliary XML ║ ║ 7 ║ BW ║ ║ 5 ║ BCR PPS XML ║ ║ 5 ║ TIFF ║ ║ 4 ║ CSV ║ ║ 4 ║ HTML ║ ╚════════════════╩═══════════════════╝ ╔════════════════╦═══════════════════════════╗ ║ count_result ║ resection_anatomic_site ║ ╠════════════════╬═══════════════════════════╣ ║ 1633 ║ kidney ║ ║ 1631 ║ <NA> ║ ║ 48 ║ blood ║ ║ 39 ║ lung ║ ║ 29 ║ lymph node ║ ║ 24 ║ liver ║ ║ 20 ║ hypodermis ║ ║ 13 ║ bone tissue ║ ║ 13 ║ peritoneum ║ ║ 11 ║ buccal mucosa ║ ║ 10 ║ pleura ║ ║ 10 ║ urinary bladder ║ ║ 7 ║ adrenal gland ║ ║ 6 ║ breast ║ ║ 5 ║ abdomen ║ ║ 5 ║ axillary lymph node ║ ║ 5 ║ brain ║ ║ 4 ║ thoracic segment of trunk ║ ║ 3 ║ craniocervical region ║ ║ 3 ║ endometrium ║ ║ 3 ║ pancreas ║ ║ 3 ║ prostate gland ║ ║ 3 ║ skin of body ║ ║ 3 ║ stomach ║ ║ 2 ║ vertebral column ║ ║ 1 ║ ascending colon ║ ║ 1 ║ biliary system ║ ║ 1 ║ body of uterus ║ ║ 1 ║ bone marrow ║ ║ 1 ║ caecum ║ ║ 1 ║ fallopian tube ║ ║ 1 ║ inguinal lymph node ║ ║ 1 ║ mediastinum ║ ║ 1 ║ mouth ║ ║ 1 ║ ovary ║ ║ 1 ║ small intestine ║ ║ 1 ║ thymus ║ ║ 1 ║ tonsil ║ ║ 1 ║ transverse colon ║ ║ 1 ║ ureter ║ ║ 1 ║ uterus ║ ║ 1 ║ vagina ║ ║ 1 ║ vermiform appendix ║ ╚════════════════╩═══════════════════════════╝ ╔════════════════╦══════════════════════════════════════════════════════════════════════════════════╗ ║ count_result ║ diagnosis ║ ╠════════════════╬══════════════════════════════════════════════════════════════════════════════════╣ ║ 1127 ║ Renal cell carcinoma ║ ║ 886 ║ Clear cell adenocarcinoma ║ ║ 714 ║ <NA> ║ ║ 711 ║ Nephroblastoma ║ ║ 344 ║ Papillary adenocarcinoma ║ ║ 120 ║ Neoplasm, malignant ║ ║ 119 ║ Renal cell carcinoma, chromophobe type ║ ║ 90 ║ Transitional cell carcinoma ║ ║ 62 ║ Rhabdoid tumor ║ ║ 55 ║ Adenocarcinoma ║ ║ 36 ║ Carcinoma ║ ║ 30 ║ Epithelial tumor, benign ║ ║ 30 ║ Neuroblastoma ║ ║ 24 ║ Adenoma ║ ║ 18 ║ Pseudosarcomatous carcinoma ║ ║ 16 ║ Malignant melanoma ║ ║ 16 ║ Squamous cell carcinoma ║ ║ 15 ║ Oxyphilic adenoma ║ ║ 14 ║ Clear cell sarcoma of kidney ║ ║ 10 ║ Medullary carcinoma ║ ║ 8 ║ Collecting duct carcinoma ║ ║ 8 ║ Infiltrating ductular carcinoma ║ ║ 7 ║ Infiltrating duct carcinoma ║ ║ 7 ║ Sarcoma ║ ║ 6 ║ Basal cell carcinoma ║ ║ 6 ║ Hepatocellular carcinoma ║ ║ 6 ║ Neuroendocrine carcinoma ║ ║ 5 ║ Adrenal cortical carcinoma ║ ║ 4 ║ Endometrioid adenocarcinoma ║ ║ 3 ║ Acinar cell carcinoma ║ ║ 3 ║ Adenocarcinoma, intestinal type ║ ║ 3 ║ Angioimmunoblastic T-cell lymphoma ║ ║ 3 ║ Burkitt lymphoma ║ ║ 3 ║ Follicular lymphoma ║ ║ 3 ║ Glioma, malignant ║ ║ 3 ║ Hodgkin lymphoma ║ ║ 3 ║ Leiomyosarcoma ║ ║ 3 ║ Malignant lymphoma, non-Hodgkin ║ ║ 3 ║ Neuroendocrine tumor ║ ║ 3 ║ Non-small cell carcinoma ║ ║ 3 ║ Pheochromocytoma ║ ║ 3 ║ Transitional cell papillomas and carcinomas ║ ║ 2 ║ Adenosquamous carcinoma ║ ║ 2 ║ Angiomyolipoma ║ ║ 2 ║ B-cell chronic lymphocytic leukemia/small lymphocytic lymphoma ║ ║ 2 ║ Basaloid squamous cell carcinoma ║ ║ 2 ║ Combined small cell carcinoma ║ ║ 2 ║ Diffuse large B-cell lymphoma ║ ║ 2 ║ Intraductal carcinoma, noninfiltrating ║ ║ 2 ║ Myomatous neoplasms ║ ║ 2 ║ Neoplasm, uncertain whether benign or malignant ║ ║ 2 ║ Nevi and melanomas ║ ║ 2 ║ Papillary carcinoma ║ ║ 2 ║ Papillary transitional cell carcinoma ║ ║ 1 ║ Adult granulosa cell tumor of testis ║ ║ 1 ║ Aggressive fibromatosis ║ ║ 1 ║ Basal cell carcinoma, nodular ║ ║ 1 ║ Benign cystic nephroma ║ ║ 1 ║ Cholangiocarcinoma ║ ║ 1 ║ Chronic myeloid leukemia ║ ║ 1 ║ Complex epithelial neoplasms ║ ║ 1 ║ Dedifferentiated liposarcoma ║ ║ 1 ║ Desmoplastic small round cell tumor ║ ║ 1 ║ Epithelioid mesothelioma, malignant ║ ║ 1 ║ Ewing sarcoma ║ ║ 1 ║ Follicular carcinoma ║ ║ 1 ║ Ganglioneuroblastoma ║ ║ 1 ║ Giant cell sarcoma ║ ║ 1 ║ Hairy cell leukemia ║ ║ 1 ║ Hemangioma ║ ║ 1 ║ Hemangiosarcoma ║ ║ 1 ║ Hereditary leiomyomatosis and renal cell carcinoma (HRCC)-associated renal ce... ║ ║ 1 ║ Lentigo maligna melanoma ║ ║ 1 ║ Lobular carcinoma ║ ║ 1 ║ Malignant fibrous histiocytoma ║ ║ 1 ║ Malignant lymphoma ║ ║ 1 ║ Malignant lymphoma, small B lymphocytic ║ ║ 1 ║ Malignant peripheral nerve sheath tumor with rhabdomyoblastic differentiation ║ ║ 1 ║ Meningioma ║ ║ 1 ║ Mesothelioma, malignant ║ ║ 1 ║ Mucinous adenocarcinoma ║ ║ 1 ║ Papillary carcinoma, follicular variant ║ ║ 1 ║ Papillary microcarcinoma ║ ║ 1 ║ Papillary squamous cell carcinoma ║ ║ 1 ║ Papillary urothelial carcinoma, non-invasive ║ ║ 1 ║ Paraganglioma, malignant ║ ║ 1 ║ Pituitary adenoma ║ ║ 1 ║ Serous carcinoma ║ ║ 1 ║ Signet ring cell carcinoma ║ ║ 1 ║ Squamous cell carcinoma, clear cell type ║ ║ 1 ║ Synovial sarcoma, spindle cell ║ ║ 1 ║ T-cell large granular lymphocytic leukemia ║ ║ 1 ║ Thymoma ║ ║ 1 ║ Thymoma, type A ║ ║ 1 ║ Undifferentiated sarcoma ║ ╚════════════════╩══════════════════════════════════════════════════════════════════════════════════╝ ╔════════════════╦═══════════════════════════════════╗ ║ count_result ║ observed_anatomic_site ║ ╠════════════════╬═══════════════════════════════════╣ ║ 2820 ║ kidney ║ ║ 624 ║ <NA> ║ ║ 275 ║ abdomen ║ ║ 68 ║ chest ║ ║ 58 ║ lung ║ ║ 41 ║ prostate gland ║ ║ 25 ║ urinary bladder ║ ║ 16 ║ breast ║ ║ 16 ║ liver ║ ║ 12 ║ colon ║ ║ 12 ║ skin of body ║ ║ 11 ║ abdominopelvic cavity ║ ║ 11 ║ craniocervical region ║ ║ 10 ║ adrenal gland ║ ║ 10 ║ pancreas ║ ║ 10 ║ renal pelvis ║ ║ 9 ║ uterus ║ ║ 8 ║ brain ║ ║ 8 ║ trunk ║ ║ 7 ║ hypodermis ║ ║ 7 ║ lymph node ║ ║ 7 ║ thyroid gland ║ ║ 6 ║ bone tissue ║ ║ 5 ║ hematopoietic system ║ ║ 5 ║ pelvic region of trunk ║ ║ 4 ║ axillary lymph node ║ ║ 4 ║ stomach ║ ║ 4 ║ ureter ║ ║ 3 ║ ascending colon ║ ║ 3 ║ endometrium ║ ║ 3 ║ eye ║ ║ 3 ║ head ║ ║ 3 ║ hindlimb ║ ║ 3 ║ ovary ║ ║ 3 ║ peritoneum ║ ║ 3 ║ spleen ║ ║ 3 ║ upper limb segment ║ ║ 2 ║ blood ║ ║ 2 ║ bone marrow ║ ║ 2 ║ ear ║ ║ 2 ║ inguinal lymph node ║ ║ 2 ║ intestine ║ ║ 2 ║ lower lobe of lung ║ ║ 2 ║ pleura ║ ║ 2 ║ posterior wall of urinary bladder ║ ║ 2 ║ rectum ║ ║ 2 ║ thoracic segment of trunk ║ ║ 2 ║ upper lobe of lung ║ ║ 1 ║ adrenal cortex ║ ║ 1 ║ anus ║ ║ 1 ║ appendage ║ ║ 1 ║ autonomic nervous system ║ ║ 1 ║ bile duct ║ ║ 1 ║ bone of pelvis ║ ║ 1 ║ cardia of stomach ║ ║ 1 ║ head of pancreas ║ ║ 1 ║ head or neck skin ║ ║ 1 ║ ilium ║ ║ 1 ║ intrahepatic bile duct ║ ║ 1 ║ lateral wall of urinary bladder ║ ║ 1 ║ mediastinum ║ ║ 1 ║ mesothelium ║ ║ 1 ║ nasal cavity ║ ║ 1 ║ nervous system ║ ║ 1 ║ pituitary gland ║ ║ 1 ║ posterior part of tongue ║ ║ 1 ║ rectosigmoid junction ║ ║ 1 ║ renal system ║ ║ 1 ║ skin of face ║ ║ 1 ║ small intestine ║ ║ 1 ║ thymus ║ ║ 1 ║ transverse colon ║ ║ 1 ║ uterine cervix ║ ║ 1 ║ vagina ║ ║ 1 ║ vertebral column ║ ╚════════════════╩═══════════════════════════════════╝ ╔════════════════╦══════════════════════════════════════════════════════════════════════════════════╗ ║ count_result ║ morphology ║ ╠════════════════╬══════════════════════════════════════════════════════════════════════════════════╣ ║ 949 ║ <NA> ║ ║ 711 ║ Nephroblastoma ║ ║ 647 ║ Clear cell adenocarcinoma ║ ║ 394 ║ Renal cell carcinoma ║ ║ 342 ║ Papillary adenocarcinoma ║ ║ 118 ║ Renal cell carcinoma, chromophobe type ║ ║ 89 ║ Transitional cell carcinoma ║ ║ 62 ║ Rhabdoid tumor ║ ║ 51 ║ Adenocarcinoma ║ ║ 33 ║ Carcinoma ║ ║ 30 ║ Neuroblastoma ║ ║ 18 ║ Pseudosarcomatous carcinoma ║ ║ 16 ║ Malignant melanoma ║ ║ 15 ║ Oxyphilic adenoma ║ ║ 15 ║ Squamous cell carcinoma ║ ║ 14 ║ Clear cell sarcoma of kidney ║ ║ 10 ║ Medullary carcinoma ║ ║ 9 ║ Infiltrating duct carcinoma ║ ║ 8 ║ Collecting duct carcinoma ║ ║ 6 ║ Basal cell carcinoma ║ ║ 6 ║ Hepatocellular carcinoma ║ ║ 6 ║ Neuroendocrine carcinoma ║ ║ 4 ║ Adrenal cortical carcinoma ║ ║ 4 ║ Endometrioid adenocarcinoma ║ ║ 3 ║ Acinar cell carcinoma ║ ║ 3 ║ Angioimmunoblastic T-cell lymphoma ║ ║ 3 ║ Burkitt lymphoma ║ ║ 3 ║ Follicular lymphoma ║ ║ 3 ║ Hodgkin lymphoma ║ ║ 3 ║ Leiomyosarcoma ║ ║ 3 ║ Malignant lymphoma, non-Hodgkin ║ ║ 3 ║ Neuroendocrine tumor ║ ║ 3 ║ Pheochromocytoma ║ ║ 2 ║ Adenosquamous carcinoma ║ ║ 2 ║ Angiomyolipoma ║ ║ 2 ║ B-cell chronic lymphocytic leukemia/small lymphocytic lymphoma ║ ║ 2 ║ Basaloid squamous cell carcinoma ║ ║ 2 ║ Combined small cell carcinoma ║ ║ 2 ║ Diffuse large B-cell lymphoma ║ ║ 2 ║ Intraductal carcinoma, noninfiltrating ║ ║ 2 ║ Non-small cell carcinoma ║ ║ 2 ║ Oligodendroglioma ║ ║ 2 ║ Papillary carcinoma ║ ║ 2 ║ Papillary transitional cell carcinoma ║ ║ 1 ║ Adult granulosa cell tumor of testis ║ ║ 1 ║ Aggressive fibromatosis ║ ║ 1 ║ Astrocytoma ║ ║ 1 ║ Basal cell carcinoma, nodular ║ ║ 1 ║ Benign cystic nephroma ║ ║ 1 ║ Cholangiocarcinoma ║ ║ 1 ║ Chronic myeloid leukemia ║ ║ 1 ║ Dedifferentiated liposarcoma ║ ║ 1 ║ Desmoplastic small round cell tumor ║ ║ 1 ║ Epithelioid mesothelioma, malignant ║ ║ 1 ║ Ewing sarcoma ║ ║ 1 ║ Follicular carcinoma ║ ║ 1 ║ Ganglioneuroblastoma ║ ║ 1 ║ Giant cell sarcoma ║ ║ 1 ║ Hairy cell leukemia ║ ║ 1 ║ Hemangioma ║ ║ 1 ║ Hemangiosarcoma ║ ║ 1 ║ Hereditary leiomyomatosis and renal cell carcinoma (HRCC)-associated renal ce... ║ ║ 1 ║ Lentigo maligna melanoma ║ ║ 1 ║ Lobular carcinoma ║ ║ 1 ║ Malignant fibrous histiocytoma ║ ║ 1 ║ Malignant lymphoma ║ ║ 1 ║ Malignant lymphoma, small B lymphocytic ║ ║ 1 ║ Malignant peripheral nerve sheath tumor with rhabdomyoblastic differentiation ║ ║ 1 ║ Marginal zone B-cell lymphoma ║ ║ 1 ║ Meningioma ║ ║ 1 ║ Mucinous adenocarcinoma ║ ║ 1 ║ Papillary carcinoma, follicular variant ║ ║ 1 ║ Papillary microcarcinoma ║ ║ 1 ║ Papillary squamous cell carcinoma ║ ║ 1 ║ Papillary urothelial carcinoma, non-invasive ║ ║ 1 ║ Paraganglioma, malignant ║ ║ 1 ║ Pituitary adenoma ║ ║ 1 ║ Serous carcinoma ║ ║ 1 ║ Signet ring cell carcinoma ║ ║ 1 ║ Squamous cell carcinoma, clear cell type ║ ║ 1 ║ Synovial sarcoma, spindle cell ║ ║ 1 ║ T-cell large granular lymphocytic leukemia ║ ║ 1 ║ Thymoma, type A ║ ║ 1 ║ Undifferentiated sarcoma ║ ╚════════════════╩══════════════════════════════════════════════════════════════════════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_birth ║ ╠════════════════╬═════════════════╣ ║ mean ║ 1964 ║ ║ min ║ 1926 ║ ║ lower quartile ║ 1949 ║ ║ median ║ 1958 ║ ║ upper quartile ║ 1970 ║ ║ max ║ 2017 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_death ║ ╠════════════════╬═════════════════╣ ║ mean ║ 2017 ║ ║ min ║ 2010 ║ ║ lower quartile ║ 2016 ║ ║ median ║ 2018 ║ ║ upper quartile ║ 2020 ║ ║ max ║ 2021 ║ ╚════════════════╩═════════════════╝
In the anatomic_site results, it looks like theres quite a lot of subjects who are also tagged with VCF. That could be interesting, if there is some kind of sequencing data maybe I could do some kind of comparison of mutations or markers to my search list:
summarize_subjects( 'kidney', 'vcf', add_columns=['anatomic_site', 'observed_anatomic_site', 'resection_anatomic_site', 'diagnosis', 'morphology', 'format'])
╔═══════════════════════════════╗ ║ number_of_matching_subjects ║ ╠═══════════════════════════════╣ ║ 2451 ║ ╚═══════════════════════════════╝ ╔════════════════════════════════════════════════╗ ║ number_of_files_related_to_matching_subjects ║ ╠════════════════════════════════════════════════╣ ║ 146456 ║ ╚════════════════════════════════════════════════╝ ╔════════════╦══════════════════════╗ ║ subjects ║ data_source ║ ╠════════════╬══════════════════════╣ ║ 1195 ║ GDC only ║ ║ 932 ║ GDC + IDC ║ ║ 121 ║ PDC + GDC + IDC ║ ║ 112 ║ PDC + GDC + GC + IDC ║ ║ 42 ║ GDC + GC + IDC ║ ║ 29 ║ PDC + GDC ║ ║ 17 ║ GC only ║ ║ 3 ║ ICDC + IDC ║ ╚════════════╩══════════════════════╝ ╔════════════════╦═══════════╗ ║ count_result ║ species ║ ╠════════════════╬═══════════╣ ║ 2444 ║ human ║ ║ 4 ║ <NA> ║ ║ 3 ║ dog ║ ╚════════════════╩═══════════╝ ╔════════════════╦═══════════════════════════════════════════╗ ║ count_result ║ race ║ ╠════════════════╬═══════════════════════════════════════════╣ ║ 1562 ║ White ║ ║ 588 ║ <NA> ║ ║ 257 ║ Black or African American ║ ║ 41 ║ Asian ║ ║ 2 ║ American Indian or Alaska Native ║ ║ 1 ║ Native Hawaiian or Other Pacific Islander ║ ╚════════════════╩═══════════════════════════════════════════╝ ╔════════════════╦════════════════════╗ ║ count_result ║ ethnicity ║ ╠════════════════╬════════════════════╣ ║ 1273 ║ Non-Hispanic ║ ║ 1062 ║ <NA> ║ ║ 116 ║ Hispanic or Latino ║ ╚════════════════╩════════════════════╝ ╔════════════════╦══════════════════════════╗ ║ count_result ║ cause_of_death ║ ╠════════════════╬══════════════════════════╣ ║ 2308 ║ <NA> ║ ║ 126 ║ Cancer-Related Death ║ ║ 6 ║ Infection ║ ║ 5 ║ Non-Cancer Related Death ║ ║ 4 ║ Toxicity ║ ║ 1 ║ Cardiovascular Disorder ║ ║ 1 ║ Surgical Complication ║ ╚════════════════╩══════════════════════════╝ ╔════════════════╦════════════════════════════╗ ║ count_result ║ anatomic_site ║ ╠════════════════╬════════════════════════════╣ ║ 96853 ║ <NA> ║ ║ 45732 ║ kidney ║ ║ 18506 ║ blood ║ ║ 1670 ║ abdomen ║ ║ 1027 ║ lung ║ ║ 350 ║ uterus ║ ║ 220 ║ right kidney ║ ║ 177 ║ liver ║ ║ 148 ║ left kidney ║ ║ 140 ║ chest ║ ║ 110 ║ urinary bladder ║ ║ 61 ║ retroperitoneal lymph node ║ ║ 50 ║ left adrenal gland ║ ║ 42 ║ skin of body ║ ║ 41 ║ brain ║ ║ 41 ║ breast ║ ║ 37 ║ vein ║ ║ 35 ║ abdominopelvic cavity ║ ║ 30 ║ right adrenal gland ║ ║ 27 ║ colon ║ ║ 27 ║ head ║ ║ 25 ║ adrenal gland ║ ║ 20 ║ right lung ║ ║ 19 ║ trunk ║ ║ 18 ║ adrenal cortex ║ ║ 18 ║ mediastinal lymph node ║ ║ 16 ║ paraaortic lymph node ║ ║ 14 ║ craniocervical region ║ ║ 11 ║ prostate gland ║ ║ 11 ║ stomach ║ ║ 10 ║ inferior vena cava ║ ║ 8 ║ left lung ║ ║ 7 ║ pancreas ║ ║ 6 ║ buccal mucosa ║ ║ 6 ║ hepatic lymph node ║ ║ 6 ║ mesothelium ║ ║ 6 ║ pelvic region of trunk ║ ║ 5 ║ thymus ║ ║ 4 ║ abdominal wall ║ ║ 4 ║ appendage ║ ║ 4 ║ humerus ║ ║ 4 ║ left renal vein ║ ║ 4 ║ paratracheal lymph node ║ ║ 4 ║ thyroid gland ║ ║ 2 ║ abdominal lymph node ║ ║ 2 ║ axillary lymph node ║ ║ 2 ║ bile duct ║ ║ 2 ║ renal pelvis ║ ║ 2 ║ spleen ║ ╚════════════════╩════════════════════════════╝ ╔════════════════╦═══════════════════╗ ║ count_result ║ format ║ ╠════════════════╬═══════════════════╣ ║ 21729 ║ VCF ║ ║ 21465 ║ TBI ║ ║ 15928 ║ TSV ║ ║ 13844 ║ BAM ║ ║ 12384 ║ TXT ║ ║ 11990 ║ BAI ║ ║ 10400 ║ DICOM ║ ║ 10210 ║ MAF ║ ║ 6481 ║ BEDPE ║ ║ 4130 ║ IDAT ║ ║ 3433 ║ SVS ║ ║ 3270 ║ <NA> ║ ║ 2038 ║ CEL ║ ║ 1943 ║ BCR XML ║ ║ 1571 ║ mzML ║ ║ 1358 ║ TAR ║ ║ 1188 ║ mzIdentML ║ ║ 982 ║ PDF ║ ║ 968 ║ BCR SSF XML ║ ║ 387 ║ BCR Biotab ║ ║ 347 ║ FASTQ ║ ║ 188 ║ BCR OMF XML ║ ║ 44 ║ JSON ║ ║ 39 ║ MEX ║ ║ 19 ║ CDC JSON ║ ║ 19 ║ HDF5 ║ ║ 18 ║ CRAI ║ ║ 18 ║ CRAM ║ ║ 17 ║ XLSX ║ ║ 14 ║ GCT ║ ║ 10 ║ BCR Auxiliary XML ║ ║ 7 ║ BW ║ ║ 5 ║ BCR PPS XML ║ ║ 4 ║ CSV ║ ║ 4 ║ HTML ║ ║ 4 ║ TIFF ║ ╚════════════════╩═══════════════════╝ ╔════════════════╦═══════════════════════════════════╗ ║ count_result ║ observed_anatomic_site ║ ╠════════════════╬═══════════════════════════════════╣ ║ 2381 ║ kidney ║ ║ 67 ║ abdomen ║ ║ 41 ║ prostate gland ║ ║ 33 ║ lung ║ ║ 22 ║ urinary bladder ║ ║ 21 ║ chest ║ ║ 16 ║ <NA> ║ ║ 16 ║ liver ║ ║ 15 ║ breast ║ ║ 12 ║ colon ║ ║ 12 ║ skin of body ║ ║ 10 ║ adrenal gland ║ ║ 10 ║ craniocervical region ║ ║ 10 ║ renal pelvis ║ ║ 9 ║ pancreas ║ ║ 8 ║ brain ║ ║ 8 ║ uterus ║ ║ 7 ║ hypodermis ║ ║ 7 ║ lymph node ║ ║ 7 ║ thyroid gland ║ ║ 6 ║ bone tissue ║ ║ 5 ║ hematopoietic system ║ ║ 5 ║ pelvic region of trunk ║ ║ 4 ║ axillary lymph node ║ ║ 4 ║ stomach ║ ║ 4 ║ ureter ║ ║ 3 ║ abdominopelvic cavity ║ ║ 3 ║ ascending colon ║ ║ 3 ║ endometrium ║ ║ 3 ║ eye ║ ║ 3 ║ head ║ ║ 3 ║ hindlimb ║ ║ 3 ║ peritoneum ║ ║ 3 ║ spleen ║ ║ 2 ║ blood ║ ║ 2 ║ bone marrow ║ ║ 2 ║ ear ║ ║ 2 ║ inguinal lymph node ║ ║ 2 ║ intestine ║ ║ 2 ║ lower lobe of lung ║ ║ 2 ║ ovary ║ ║ 2 ║ pleura ║ ║ 2 ║ posterior wall of urinary bladder ║ ║ 2 ║ rectum ║ ║ 2 ║ thoracic segment of trunk ║ ║ 2 ║ trunk ║ ║ 2 ║ upper limb segment ║ ║ 2 ║ upper lobe of lung ║ ║ 1 ║ adrenal cortex ║ ║ 1 ║ appendage ║ ║ 1 ║ autonomic nervous system ║ ║ 1 ║ bile duct ║ ║ 1 ║ bone of pelvis ║ ║ 1 ║ cardia of stomach ║ ║ 1 ║ head of pancreas ║ ║ 1 ║ head or neck skin ║ ║ 1 ║ ilium ║ ║ 1 ║ intrahepatic bile duct ║ ║ 1 ║ lateral wall of urinary bladder ║ ║ 1 ║ mediastinum ║ ║ 1 ║ mesothelium ║ ║ 1 ║ nasal cavity ║ ║ 1 ║ nervous system ║ ║ 1 ║ pituitary gland ║ ║ 1 ║ posterior part of tongue ║ ║ 1 ║ rectosigmoid junction ║ ║ 1 ║ renal system ║ ║ 1 ║ skin of face ║ ║ 1 ║ small intestine ║ ║ 1 ║ thymus ║ ║ 1 ║ transverse colon ║ ║ 1 ║ uterine cervix ║ ║ 1 ║ vagina ║ ║ 1 ║ vertebral column ║ ╚════════════════╩═══════════════════════════════════╝ ╔════════════════╦══════════════════════════════════════════════════════════════════════════════════╗ ║ count_result ║ morphology ║ ╠════════════════╬══════════════════════════════════════════════════════════════════════════════════╣ ║ 690 ║ Nephroblastoma ║ ║ 639 ║ Clear cell adenocarcinoma ║ ║ 392 ║ Renal cell carcinoma ║ ║ 341 ║ Papillary adenocarcinoma ║ ║ 91 ║ Renal cell carcinoma, chromophobe type ║ ║ 89 ║ Transitional cell carcinoma ║ ║ 51 ║ Adenocarcinoma ║ ║ 32 ║ Carcinoma ║ ║ 28 ║ <NA> ║ ║ 26 ║ Neuroblastoma ║ ║ 18 ║ Pseudosarcomatous carcinoma ║ ║ 15 ║ Malignant melanoma ║ ║ 15 ║ Oxyphilic adenoma ║ ║ 14 ║ Clear cell sarcoma of kidney ║ ║ 14 ║ Squamous cell carcinoma ║ ║ 10 ║ Medullary carcinoma ║ ║ 8 ║ Collecting duct carcinoma ║ ║ 8 ║ Infiltrating duct carcinoma ║ ║ 6 ║ Basal cell carcinoma ║ ║ 6 ║ Hepatocellular carcinoma ║ ║ 6 ║ Neuroendocrine carcinoma ║ ║ 4 ║ Adrenal cortical carcinoma ║ ║ 4 ║ Endometrioid adenocarcinoma ║ ║ 3 ║ Acinar cell carcinoma ║ ║ 3 ║ Angioimmunoblastic T-cell lymphoma ║ ║ 3 ║ Burkitt lymphoma ║ ║ 3 ║ Follicular lymphoma ║ ║ 3 ║ Hodgkin lymphoma ║ ║ 3 ║ Leiomyosarcoma ║ ║ 3 ║ Malignant lymphoma, non-Hodgkin ║ ║ 3 ║ Neuroendocrine tumor ║ ║ 3 ║ Pheochromocytoma ║ ║ 2 ║ Adenosquamous carcinoma ║ ║ 2 ║ Angiomyolipoma ║ ║ 2 ║ B-cell chronic lymphocytic leukemia/small lymphocytic lymphoma ║ ║ 2 ║ Basaloid squamous cell carcinoma ║ ║ 2 ║ Combined small cell carcinoma ║ ║ 2 ║ Diffuse large B-cell lymphoma ║ ║ 2 ║ Intraductal carcinoma, noninfiltrating ║ ║ 2 ║ Non-small cell carcinoma ║ ║ 2 ║ Oligodendroglioma ║ ║ 2 ║ Papillary carcinoma ║ ║ 2 ║ Papillary transitional cell carcinoma ║ ║ 1 ║ Adult granulosa cell tumor of testis ║ ║ 1 ║ Aggressive fibromatosis ║ ║ 1 ║ Astrocytoma ║ ║ 1 ║ Basal cell carcinoma, nodular ║ ║ 1 ║ Benign cystic nephroma ║ ║ 1 ║ Cholangiocarcinoma ║ ║ 1 ║ Chronic myeloid leukemia ║ ║ 1 ║ Dedifferentiated liposarcoma ║ ║ 1 ║ Desmoplastic small round cell tumor ║ ║ 1 ║ Epithelioid mesothelioma, malignant ║ ║ 1 ║ Ewing sarcoma ║ ║ 1 ║ Follicular carcinoma ║ ║ 1 ║ Ganglioneuroblastoma ║ ║ 1 ║ Giant cell sarcoma ║ ║ 1 ║ Hairy cell leukemia ║ ║ 1 ║ Hemangioma ║ ║ 1 ║ Hereditary leiomyomatosis and renal cell carcinoma (HRCC)-associated renal ce... ║ ║ 1 ║ Lentigo maligna melanoma ║ ║ 1 ║ Lobular carcinoma ║ ║ 1 ║ Malignant fibrous histiocytoma ║ ║ 1 ║ Malignant lymphoma ║ ║ 1 ║ Malignant lymphoma, small B lymphocytic ║ ║ 1 ║ Marginal zone B-cell lymphoma ║ ║ 1 ║ Meningioma ║ ║ 1 ║ Mucinous adenocarcinoma ║ ║ 1 ║ Papillary carcinoma, follicular variant ║ ║ 1 ║ Papillary microcarcinoma ║ ║ 1 ║ Papillary squamous cell carcinoma ║ ║ 1 ║ Papillary urothelial carcinoma, non-invasive ║ ║ 1 ║ Pituitary adenoma ║ ║ 1 ║ Rhabdoid tumor ║ ║ 1 ║ Serous carcinoma ║ ║ 1 ║ Signet ring cell carcinoma ║ ║ 1 ║ Squamous cell carcinoma, clear cell type ║ ║ 1 ║ Synovial sarcoma, spindle cell ║ ║ 1 ║ T-cell large granular lymphocytic leukemia ║ ║ 1 ║ Thymoma, type A ║ ║ 1 ║ Undifferentiated sarcoma ║ ╚════════════════╩══════════════════════════════════════════════════════════════════════════════════╝ ╔════════════════╦═══════════════════════════╗ ║ count_result ║ resection_anatomic_site ║ ╠════════════════╬═══════════════════════════╣ ║ 1541 ║ kidney ║ ║ 681 ║ <NA> ║ ║ 37 ║ lung ║ ║ 27 ║ lymph node ║ ║ 23 ║ blood ║ ║ 23 ║ liver ║ ║ 20 ║ hypodermis ║ ║ 12 ║ bone tissue ║ ║ 12 ║ peritoneum ║ ║ 10 ║ pleura ║ ║ 10 ║ urinary bladder ║ ║ 7 ║ adrenal gland ║ ║ 6 ║ breast ║ ║ 5 ║ abdomen ║ ║ 5 ║ axillary lymph node ║ ║ 5 ║ brain ║ ║ 4 ║ thoracic segment of trunk ║ ║ 3 ║ craniocervical region ║ ║ 3 ║ endometrium ║ ║ 3 ║ prostate gland ║ ║ 3 ║ skin of body ║ ║ 3 ║ stomach ║ ║ 2 ║ pancreas ║ ║ 2 ║ vertebral column ║ ║ 1 ║ ascending colon ║ ║ 1 ║ biliary system ║ ║ 1 ║ body of uterus ║ ║ 1 ║ bone marrow ║ ║ 1 ║ buccal mucosa ║ ║ 1 ║ caecum ║ ║ 1 ║ fallopian tube ║ ║ 1 ║ inguinal lymph node ║ ║ 1 ║ mediastinum ║ ║ 1 ║ mouth ║ ║ 1 ║ ovary ║ ║ 1 ║ small intestine ║ ║ 1 ║ thymus ║ ║ 1 ║ tonsil ║ ║ 1 ║ transverse colon ║ ║ 1 ║ ureter ║ ║ 1 ║ uterus ║ ║ 1 ║ vagina ║ ║ 1 ║ vermiform appendix ║ ╚════════════════╩═══════════════════════════╝ ╔════════════════╦══════════════════════════════════════════════════════════════════════════════════╗ ║ count_result ║ diagnosis ║ ╠════════════════╬══════════════════════════════════════════════════════════════════════════════════╣ ║ 910 ║ Renal cell carcinoma ║ ║ 871 ║ Clear cell adenocarcinoma ║ ║ 690 ║ Nephroblastoma ║ ║ 341 ║ Papillary adenocarcinoma ║ ║ 119 ║ Neoplasm, malignant ║ ║ 91 ║ Renal cell carcinoma, chromophobe type ║ ║ 90 ║ Transitional cell carcinoma ║ ║ 55 ║ Adenocarcinoma ║ ║ 35 ║ Carcinoma ║ ║ 26 ║ Neuroblastoma ║ ║ 24 ║ Adenoma ║ ║ 23 ║ <NA> ║ ║ 23 ║ Epithelial tumor, benign ║ ║ 18 ║ Pseudosarcomatous carcinoma ║ ║ 15 ║ Malignant melanoma ║ ║ 15 ║ Oxyphilic adenoma ║ ║ 15 ║ Squamous cell carcinoma ║ ║ 14 ║ Clear cell sarcoma of kidney ║ ║ 10 ║ Medullary carcinoma ║ ║ 8 ║ Collecting duct carcinoma ║ ║ 7 ║ Infiltrating ductular carcinoma ║ ║ 7 ║ Sarcoma ║ ║ 6 ║ Basal cell carcinoma ║ ║ 6 ║ Hepatocellular carcinoma ║ ║ 6 ║ Infiltrating duct carcinoma ║ ║ 6 ║ Neuroendocrine carcinoma ║ ║ 4 ║ Adrenal cortical carcinoma ║ ║ 4 ║ Endometrioid adenocarcinoma ║ ║ 3 ║ Acinar cell carcinoma ║ ║ 3 ║ Adenocarcinoma, intestinal type ║ ║ 3 ║ Angioimmunoblastic T-cell lymphoma ║ ║ 3 ║ Burkitt lymphoma ║ ║ 3 ║ Follicular lymphoma ║ ║ 3 ║ Glioma, malignant ║ ║ 3 ║ Hodgkin lymphoma ║ ║ 3 ║ Leiomyosarcoma ║ ║ 3 ║ Malignant lymphoma, non-Hodgkin ║ ║ 3 ║ Neuroendocrine tumor ║ ║ 3 ║ Non-small cell carcinoma ║ ║ 3 ║ Pheochromocytoma ║ ║ 3 ║ Transitional cell papillomas and carcinomas ║ ║ 2 ║ Adenosquamous carcinoma ║ ║ 2 ║ Angiomyolipoma ║ ║ 2 ║ B-cell chronic lymphocytic leukemia/small lymphocytic lymphoma ║ ║ 2 ║ Basaloid squamous cell carcinoma ║ ║ 2 ║ Combined small cell carcinoma ║ ║ 2 ║ Diffuse large B-cell lymphoma ║ ║ 2 ║ Intraductal carcinoma, noninfiltrating ║ ║ 2 ║ Myomatous neoplasms ║ ║ 2 ║ Nevi and melanomas ║ ║ 2 ║ Papillary carcinoma ║ ║ 2 ║ Papillary transitional cell carcinoma ║ ║ 1 ║ Adult granulosa cell tumor of testis ║ ║ 1 ║ Aggressive fibromatosis ║ ║ 1 ║ Basal cell carcinoma, nodular ║ ║ 1 ║ Benign cystic nephroma ║ ║ 1 ║ Cholangiocarcinoma ║ ║ 1 ║ Chronic myeloid leukemia ║ ║ 1 ║ Complex epithelial neoplasms ║ ║ 1 ║ Dedifferentiated liposarcoma ║ ║ 1 ║ Desmoplastic small round cell tumor ║ ║ 1 ║ Epithelioid mesothelioma, malignant ║ ║ 1 ║ Ewing sarcoma ║ ║ 1 ║ Follicular carcinoma ║ ║ 1 ║ Ganglioneuroblastoma ║ ║ 1 ║ Giant cell sarcoma ║ ║ 1 ║ Hairy cell leukemia ║ ║ 1 ║ Hemangioma ║ ║ 1 ║ Hereditary leiomyomatosis and renal cell carcinoma (HRCC)-associated renal ce... ║ ║ 1 ║ Lentigo maligna melanoma ║ ║ 1 ║ Lobular carcinoma ║ ║ 1 ║ Malignant fibrous histiocytoma ║ ║ 1 ║ Malignant lymphoma ║ ║ 1 ║ Malignant lymphoma, small B lymphocytic ║ ║ 1 ║ Meningioma ║ ║ 1 ║ Mesothelioma, malignant ║ ║ 1 ║ Mucinous adenocarcinoma ║ ║ 1 ║ Neoplasm, uncertain whether benign or malignant ║ ║ 1 ║ Papillary carcinoma, follicular variant ║ ║ 1 ║ Papillary microcarcinoma ║ ║ 1 ║ Papillary squamous cell carcinoma ║ ║ 1 ║ Papillary urothelial carcinoma, non-invasive ║ ║ 1 ║ Pituitary adenoma ║ ║ 1 ║ Rhabdoid tumor ║ ║ 1 ║ Serous carcinoma ║ ║ 1 ║ Signet ring cell carcinoma ║ ║ 1 ║ Squamous cell carcinoma, clear cell type ║ ║ 1 ║ Synovial sarcoma, spindle cell ║ ║ 1 ║ T-cell large granular lymphocytic leukemia ║ ║ 1 ║ Thymoma ║ ║ 1 ║ Thymoma, type A ║ ║ 1 ║ Undifferentiated sarcoma ║ ╚════════════════╩══════════════════════════════════════════════════════════════════════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_birth ║ ╠════════════════╬═════════════════╣ ║ mean ║ 1964 ║ ║ min ║ 1926 ║ ║ lower quartile ║ 1949 ║ ║ median ║ 1958 ║ ║ upper quartile ║ 1970 ║ ║ max ║ 2017 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_death ║ ╠════════════════╬═════════════════╣ ║ mean ║ 2018 ║ ║ min ║ 2010 ║ ║ lower quartile ║ 2016 ║ ║ median ║ 2018 ║ ║ upper quartile ║ 2020 ║ ║ max ║ 2021 ║ ╚════════════════╩═════════════════╝
At this point, I can keep adding keywords as I go, or I can use the information I've gathered to make a targeted search for just the data I want. So now instead a global keyword search, I'm going to specify I only want kidney values from anatomic site, and I only want vcf from format:
summarize_subjects(match_all=['format = vcf', 'anatomic_site = kidney'])
╔═══════════════════════════════╗ ║ number_of_matching_subjects ║ ╠═══════════════════════════════╣ ║ 328 ║ ╚═══════════════════════════════╝ ╔════════════════════════════════════════════════╗ ║ number_of_files_related_to_matching_subjects ║ ╠════════════════════════════════════════════════╣ ║ 6035 ║ ╚════════════════════════════════════════════════╝ ╔════════════╦══════════════════════╗ ║ subjects ║ data_source ║ ╠════════════╬══════════════════════╣ ║ 121 ║ IDC + PDC + GDC ║ ║ 110 ║ IDC + PDC + GC + GDC ║ ║ 50 ║ GDC only ║ ║ 29 ║ PDC + GDC ║ ║ 17 ║ GC only ║ ║ 1 ║ IDC + GDC ║ ╚════════════╩══════════════════════╝ ╔════════════════╦═══════════╗ ║ count_result ║ species ║ ╠════════════════╬═══════════╣ ║ 324 ║ human ║ ║ 4 ║ <NA> ║ ╚════════════════╩═══════════╝ ╔════════════════╦══════════════════════════╗ ║ count_result ║ cause_of_death ║ ╠════════════════╬══════════════════════════╣ ║ 296 ║ <NA> ║ ║ 22 ║ Cancer-Related Death ║ ║ 5 ║ Non-Cancer Related Death ║ ║ 3 ║ Infection ║ ║ 1 ║ Cardiovascular Disorder ║ ║ 1 ║ Surgical Complication ║ ╚════════════════╩══════════════════════════╝ ╔════════════════╦════════════════════╗ ║ count_result ║ ethnicity ║ ╠════════════════╬════════════════════╣ ║ 184 ║ <NA> ║ ║ 128 ║ Non-Hispanic ║ ║ 16 ║ Hispanic or Latino ║ ╚════════════════╩════════════════════╝ ╔════════════════╦═══════════════════════════╗ ║ count_result ║ race ║ ╠════════════════╬═══════════════════════════╣ ║ 275 ║ White ║ ║ 23 ║ Asian ║ ║ 16 ║ Black or African American ║ ║ 14 ║ <NA> ║ ╚════════════════╩═══════════════════════════╝ ╔════════════════╦══════════╗ ║ count_result ║ format ║ ╠════════════════╬══════════╣ ║ 6035 ║ VCF ║ ╚════════════════╩══════════╝ ╔════════════════╦═════════════════╗ ║ count_result ║ anatomic_site ║ ╠════════════════╬═════════════════╣ ║ 6035 ║ kidney ║ ║ 5804 ║ blood ║ ║ 35 ║ lung ║ ║ 14 ║ abdomen ║ ║ 7 ║ liver ║ ╚════════════════╩═════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_birth ║ ╠════════════════╬═════════════════╣ ║ mean ║ 1963 ║ ║ min ║ 1926 ║ ║ lower quartile ║ 1949 ║ ║ median ║ 1958 ║ ║ upper quartile ║ 1969 ║ ║ max ║ 2017 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_death ║ ╠════════════════╬═════════════════╣ ║ mean ║ 2018 ║ ║ min ║ 2010 ║ ║ lower quartile ║ 2016 ║ ║ median ║ 2018 ║ ║ upper quartile ║ 2020 ║ ║ max ║ 2021 ║ ╚════════════════╩═════════════════╝
It looks like most of the subjects also have blood info, I wonder if that could be useful. Instead of summarizing again, I'm going to browse through the row level data for subjects with kidney, blood, and vcf data. Since kidney and blood are both anatomic sites, i'll have to do two searches and take the intersection:
justkidney = get_subject_data(match_all=['format = vcf', 'anatomic_site = kidney'])
justblood = get_subject_data(match_all=['format = vcf', 'anatomic_site = blood'])
intersect_subject_results(justkidney, justblood)
Loading ITables v2.7.3 from the init_notebook_mode cell...
(need help?)
|
Now I have just the 310 subjects with kidney, blood, and vcf. But I wonder if there are VCFs for both the blood and kidney. I can change my most recent search to get that info added too. Since I want to map each VCF to an anatomic site, I need to add all the file columns to my search, and have them collated. Then I expand the intersected results:
justkidney = get_subject_data(match_all=['format = vcf', 'anatomic_site = kidney'], add_columns='file.*', collate_results = True)
justblood = get_subject_data(match_all=['format = vcf', 'anatomic_site = blood'], add_columns='file.*', collate_results = True)
expand_subject_results(intersect_subject_results(justkidney, justblood), 'file_data')
Loading ITables v2.7.3 from the init_notebook_mode cell...
(need help?)
|
now I have one row per file, and I can see exactly how all the variables relate to one another. For instance, subject CCDI.689.0 has four WGS files, three are kidney and one is blood. If all the subjects have both kidney and blood WGS, maybe I can look for organ specific mutations.