Conceptual Overview
You can think of the CDA as a really, really enormous spreadsheet full of data. To search this enormous spreadsheet, you'd want to select columns that have data you're interested in, and then filter the rows to only the values you care about.
CDA data comes from six sources:
- The Proteomic Data Commons (PDC)
- The Genomic Data Commons (GDC)
- The Imaging Data Commons (IDC)
- The General Commons (GC)
- The Integrated Canine Data Commons (ICDC)
- The ISB Cancer Gateway in the Cloud (ISB-CGC)
- subject: A patient entity captures the study-independent metadata for research subjects. Human research subjects are usually not traceable to a particular person to protect the subjects privacy.
- file: A unit of data about subjects, researchsubjects, specimens, or their associated information.
If you are looking to build a cohort of distinct individuals who meet some criteria, you would search using get_subject_data, and the result will be a table of information with one row per subject, then use the add_columns feature inside of get_subject_data to add on extra information.