Our data sources

The Genomic Data Commons (GDC) is a cancer knowledge network that supports hosting, standardization, and analysis of genomic, clinical, and biospecimen data from cancer research programs. The GDC harmonizes raw sequencing data, identifies and applies state-of-the-art bioinformatics methods for generating mutation calls, structural variants and other high-level data, and provides scalable downloads and web-based analysis tools. Because of the personal nature of genomic data, some genomic data in the GDC may be controlled access, requiring eRA Commons authentication and dbGaP authorization to access the data.

The Proteomic Data Commons (PDC) was developed to advance understanding of how proteins help to shape the risk, diagnosis, development, progression, and treatment of cancer. In-depth analysis of proteomic data allows the study of both how and why cancer develops and informs ways of tailoring treatment for individual patients using precision medicine. All proteomic data in the PDC are open access and, with appropriate attribution, can be included in publications.

NCI Imaging Data Commons (IDC) is a cloud-based repository of publicly available cancer imaging data co-located with the analysis and exploration tools and resources. IDC is a node within the broader NCI Cancer Research Data Commons (CRDC) infrastructure that provides secure access to a large, comprehensive, and expanding collection of cancer research data.

All data hosted by IDC is available publicly. The current content of IDC is populated using the radiology collections from The Cancer Imaging Archive (TCIA), as well as data collected by other major NCI initiatives, such as TCGA, CPTAC, NLST and HTAN. IDC does not perform de-identification of images but accepts data de-identified by TCIA or other Data Coordinating Centers that are approved by NCI Security.

The CDS provides data storage and sharing capabilities for NCI-funded studies that fall under the following categories: • Studies with data that do not match an existing CRDC data commons • Studies with data that do not fit current data type criteria and/or the minimum metadata standards for a CRDC data commons.

CDS currently hosts a variety of data types from NCI projects such as the Human Tumor Atlas Network (HTAN), Division of Cancer Control and Population Sciences (DCCPS), and Childhood Cancer Data Initiative (CCDI) as well as data from independent research projects. The CDS is home to both open and controlled access data.

The Integrated Canine Data Commons (ICDC) is a cloud-based repository of spontaneously-arising canine cancer data. ICDC was established to further research on human cancers by enabling comparative analysis with canine cancer. The data in the ICDC is sourced from multiple different programs and projects; all focused on canine subjects. The data is harmonized into an integrated data model and then made available to the research community.

The ISB Cancer Gateway in the Cloud (ISB-CGC) is one of three National Cancer Institute (NCI) Cloud Resources tasked with enabling researchers to combine cancer data and cloud computation. The ISB-CGC cloud resource hosts data from a variety of sources such as HTAN and TCGA, CPTAC, and TARGET from the GDC and PDC in Google BigQuery columnar data tables. This includes file, case, clinical, and open access derived data that can be accessed both programmatically and through interactive web applications, eliminating the need to download and store large data sets.

  • Data Standards Services (DSS) The DSS provides us with harmonized values mapped to the data sources above. In our current release, DSS has provided values for: ethnicity, file_format, morphology, primary_diagnosis, race, species, therapeutic_agent, source_material_type (cancer/normal), treatment_type, and vital_status.