VCF and image files available for the same patients¶
I'm a cancer researcher, and I have a hypothesis that I can correlate a specific mutation with a specific thyroid cancer morphology. I'm looking for subjects that have thyroid cancer with both CT and sequencing data that I might be able to incorporate into my research.
First, decide what column to search. I'm looking for columns that have to do with file type:
columns(table="file")
table | column | data_type | nullable | description |
---|---|---|---|---|
Loading... (need help?) |
column_values("data_type")
data_type | count |
---|---|
Loading... (need help?) |
CT image storage and annotated somatic mutation files should give me the data I want. Now to find a column to seperate out the thyroid cancer subjects:
columns(column = "*diagnosis*")
table | column | data_type | nullable | description |
---|---|---|---|---|
Loading... (need help?) |
column_values("primary_diagnosis_site", filters= "*thyroid*" )
primary_diagnosis_site | count |
---|---|
Loading... (need help?) |
Looks like there are lots of thyroid patients, however thyroid has been specified a few different ways, so I'll have to search with a wildcard. thyroid*
will match anything that starts with thyroid
no matter what words or letters are after it. I want subject data, so I'll search the subject table, but I want to add the file data on so I know where to get the files, that makes my final search look like this:
fetch_rows(table="subject", match_all=['primary_diagnosis_site = thyroid*', 'data_type = CT Image Storage', 'data_type = Annotated Somatic Mutation'], link_to_table="file")
subject_id | cause_of_death | days_to_birth | days_to_death | ethnicity | race | sex | species | vital_status | file_id | byte_size | checksum | data_category | data_modality | data_type | dbgap_accession_number | drs_uri | file_format | imaging_modality | imaging_series | label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading... (need help?) |