Skip to content

Public releases

Available December 20, 2024

  • GDC data release 41.0; API version 7.5.1 (extracted 2024-11-19)
    • GDC introduced breaking changes into its data model between our November extraction and our mid-December extraction attempt
    • 18 diagnosis fields were removed
    • release notes may not give full field list; complete changelog estimate currently under way
    • GDC data in CDA's December release is thus based on the extracted instance of v41.0 from November, while we adjust our ETL
  • PDC data release 4.4; API version 3.0.15 (extracted 2024-12-17)
  • IDC data release v20 (extracted 2024-12-17)
  • CDS data release 15.0 (from most recent dump file provided to us by CDS -- received 2024-12-18)
    • CDS removed tumor_tissue_type from image, resulting in some CDA data loss (tumor/normal annotations)
    • CDA changed our file.data_category field for CDS data to use CDS's genomic_info.library_strategy field instead of earlier file.experimental_strategy_and_data_subtypes
  • ICDC data release 2023-10-16; front-end version 4.1.0 (extracted 2024-12-17)

Available November 26, 2024

Data extraction and release information

  • GDC data release 41.0; API version 7.6.1 (extracted 2024-11-19)
  • PDC data release 4.4; API version 3.0.12 (extracted 2024-11-19)
  • IDC data release v19 (extracted 2024-11-19)
  • CDS data release 13.0; front-end version 4.3.0 (from most recent dump file provided to us by CDS -- received 2024-10-31)
  • ICDC data release 2023-10-16; front-end version 4.1.0 (extracted 2024-11-19)

Available October 29, 2024

Data extraction and release information

CDA data version 2024-10

  • GDC data release 41.0; API version 7.4.1 (extracted 2024-10-11)
  • PDC data release 4.4; API version 3.0.10 (extracted 2024-10-11)
  • IDC data release v19 (extracted 2024-10-11)
  • CDS data release 13.0; front-end version 4.3.0 (from most recent dump file provided to us by CDS -- received 2024-10-02)
  • ICDC data release 2023-10-16; front-end version 4.1.0 (extracted 2024-10-21)

Available August 27, 2024

Data extraction and release information

CDA data version 2024-08

  • GDC data release 40.0 extracted 2024-08-18
  • PDC data release 4.3 extracted 2024-08-18
  • IDC data release v18 extracted 2024-08-18
  • CDS data release 12.0 extracted 2024-08-14
  • ICDC data release 2023-10-16 extracted 2024-08-18

Available July 23, 2024

Data extraction and release information

CDA data version 2024-07 Extracted July 18, 2024:

  • GDC data release 40.0; API version 4.0.0 tag 7.3
  • PDC data release 4.2; API version 3.0.4.1
  • IDC data release v18
  • CDS data release 11.0; front-end version 4.2.0.269
  • ICDC data release 2023-10-16; front-end version 4.0.0.181

Available June 26, 2024

Data extraction and release information

CDA data version 2024-06 Extracted Fri June 21 2024:

  • CDS v10.0
  • GDC v40.0
  • ICDC v4.0.0
  • IDC v18
  • PDC v4.1

Available May 29, 2024

Data extraction and release information

  • GDC data version 40.0
  • PDC data version 4.1
  • IDC data version v18
  • CDS data version 9.0

Known Issues

  • DICOM was not included in the CRDC Data Element list for file_format, so no IDC files have file_format values

Available April 5, 2024

CDA is now harmonizing terms as they are incorporated into the CRDC Data Element list. In this release we have included harmonized the values for:

  • ethnicity
  • file_format
  • morphology
  • primary_diagnosis
  • race
  • species
  • therapeutic_agent
  • source_material_type (cancer/normal)
  • treatment_type
  • vital_status

In future releases, we expect the harmonization to both broaden and improve. Additionally, in an upcoming release we will provide both the harmonized and original values to make finding the original data easier.

Data extraction and release information

  • GDC data version 39.0 (extraction date 3/27/2024)
  • PDC data version 3.8 (extraction date 3/27/2024)
  • IDC data version v17 (extraction date 3/27/2024)
  • CDS data version 8.0 (extraction date 3/27/2024)

Known Issues

  • DICOM was not included in the CRDC Data Element list for file_format, so no IDC files have file_format values
  • CDS data includes clashing integer IDs. We included that data with the following changes:
    • Ensured that any integer IDs are well-wrapped by project qualifiers to make them unique within CDS
    • In instances where the same ID was attached to multiple, conflicting metadata the resulting records will be clobbered copies of one instance. A record of the effected data is available here

beta versions

Available September, 12 2023.

Data extraction and release information

  • GDC data version 38.0 (extraction date 9/1/2023)
  • PDC data version 3.4 (extraction date 8/24/2023)
  • IDC data version 15 (extraction date 7/19/2023)
  • CDS data version 3.0 (extraction date 8/31/2023)

Data from Cancer Data Services (CDS) is now available!

Available June 13, 2023.

Data extraction and release information

  • GDC data version 37.0 (extraction date 6/1/2023)
  • PDC data version 3.0 (extraction date 6/5/2023)
  • IDC data version 14 (extraction date 6/7/2023)

IDC data now contains ethnicity data

Available May 4, 2023.

Datasets & Fields

Data extraction and release information

The current version and release dates for each of the database are:

  • GDC data version 37, extraction date - ⅘/2023
  • PDC data version 2.16, extraction date - 2/9/2023
  • IDC data version 13, extraction date - 4/4/2023

Available November 3, 2022.

Datasets & Fields

Data extraction and release information

The current version and release dates for each of the database are:

  • GDC data version - v34.0, GDC extraction date - 09/29/2022
  • PDC data version - v2.10, PDC extraction date - 09/29/2022
  • IDC data version - v.10.0, IDC extraction date - 09/29/2022

Available September 2022.

Datasets & Fields

  • Versions:
    • GDC: v33.1, 06/23/2022
    • PDC: v2.7, 06/23/20221
    • IDC: v.9.0, 06/24/2022

Early alphas

Available as of 7/11/22.

The beta 3.0 release of CDA searches across data from the Genomics Data Commons (GDC), the Proteomics Data Commons (PDC), and the Imaging Data Commons (IDC) to aggregate and return data to users via a single application programming interface (API).

Datasets & Fields

  • All datasets updated as follows
    • GDC: v33.1, 06/23/2022
    • PDC: v2.7, 06/23/20221
    • IDC: v.9.0, 06/24/2022

Metadata Changes

  • Summary
    • Previous table format now called Subjects endpoint
      • Replaced all File entities with Files - a list of file ids associated with the entity that the list is located in. e.g
        • File -> Files
        • ResearchSubject.File -> ResearchSubject.Files
        • ResearchSubject.Specimen.File -> ResearchSubject.Specimen.Files
    • Files endpoint added:
      • Endpoint oriented around File information
      • Includes all information regarding the file's associated entities(Subject, ResearchSubject, and Specimen)
    • Newly available fields:
      • vital_status
      • days_to_death
      • cause_of_death
      • ResearchSubject.Diagnosis.morphology
      • ResearchSubject.Diagnosis.method_of_diagnosis
      • File.data_modality
      • File.dbgap_accession_number
      • File.imaging_modality
      • File.imaging_series
      • ResearchSubject.Diagnosis.Treatment.therapeutic_agent
      • ResearchSubject.Diagnosis.Treatment.treatment_anatomic_site
      • ResearchSubject.Diagnosis.Treatment.treatment_effect
      • ResearchSubject.Diagnosis.Treatment.treatment_end_reason
      • ResearchSubject.Diagnosis.Treatment.number_of_cycles
    • Renamed fields (old -> new):
      • ResearchSubject.associated_project -> ResearchSubject.member_of_research_project
      • ResearchSubject.primary_disease_site -> ResearchSubject.primary_diagnosis_site
      • ResearchSubject.primary_disease_type -> ResearchSubject.primary_disease_type
      • ResearchSubject.Specimen.age_at_collection -> ResearchSubject.Specimen.days_to_collection

Known bugs and issues - these will be fixed in an upcoming release

  • tumor stages are not harmonized, there are redundant terms (complicates query)
  • Searches on the subject endpoint incorrectly count files. Please use the file counts for the same query from the files endpoint
  • Some PDC files are incorrectly labeled at the specimen level, for e.g. a file may be inappropriately labeled as both cancer and normal.

2.X

Version 3.0 is a full rewrite of our code and older versions of cdapython are no longer maintained or supported. If you'd like to see how the project has evolved, you can still access the their documentation here:


  1. Information pulled from the PDC API may contain embargoed data.