Find all the CPTAC subjects¶

I'm a researcher, and I want to reuse data from the Clinical Proteomic Tumor Analysis Consortium, but it's been stored across multiple data centers. I just want an easy way to track it all down.

First, decide what column to search. I'm looking for columns that have to do with project:

In [3]:

Copied!

columns(column=["*project*"])
columns(column=["*project*"])

Out[3]:

table	column	data_type	nullable	description
Loading... (need help?)

member_of_research_project has the definition I'm looking for, so I'm going to search that for cptac. I want both subject and researchsubject info, so I'm requesting rows that match cptac from those two tables, joined:

In [4]:

Copied!

fetch_rows(table="subject", match_all="member_of_research_project = *cptac*", link_to_table='researchsubject')
fetch_rows(table="subject", match_all="member_of_research_project = *cptac*", link_to_table='researchsubject')

Out[4]:

	subject_id	cause_of_death	days_to_birth	days_to_death	ethnicity	race	sex	species	vital_status	researchsubject_id	member_of_research_project	primary_diagnosis_condition	primary_diagnosis_site
Loading... (need help?)

This looks like what I want, so I'll re-run the query but save it to a file this time:

In [5]:

Copied!

fetch_rows(table="subject", match_all="member_of_research_project = *cptac*", link_to_table='researchsubject', return_data_as='tsv', output_file='my_file.tsv')
fetch_rows(table="subject", match_all="member_of_research_project = *cptac*", link_to_table='researchsubject', return_data_as='tsv', output_file='my_file.tsv')