Cancer diagnoses by age¶
I'm a cancer researcher, and I'm interested in profiling adenocarcinoma occurance as a function of age.
First, decide what column to search. I'm looking for columns that have to do with age:
columns(description="age")
Loading ITables v2.7.3 from the init_notebook_mode cell...
(need help?)
|
age_at_observation is exactly what I need, and the description tells me that age is in years. Just to see what the data looks like, I'm going to ask for subjects who have adenocarcinoma and any observation age. The asterisks on either side of *adenocarcinoma* say that anything can be in front of or after adenocarcinoma in the diagnosis, so it will give me back results that have subtypes specified. And the exclaimation point in front of the equals sign != means NOT, so it will give back only not-null values:
summarize_subjects(match_all=["diagnosis = *adenocarcinoma*", "age_at_observation != NULL"])
╔═══════════════════════════════╗ ║ number_of_matching_subjects ║ ╠═══════════════════════════════╣ ║ 1164 ║ ╚═══════════════════════════════╝ ╔════════════════════════════════════════════════╗ ║ number_of_files_related_to_matching_subjects ║ ╠════════════════════════════════════════════════╣ ║ 131467 ║ ╚════════════════════════════════════════════════╝ ╔════════════╦══════════════════════╗ ║ subjects ║ data_source ║ ╠════════════╬══════════════════════╣ ║ 425 ║ PDC + GDC + IDC ║ ║ 358 ║ PDC + GC + GDC + IDC ║ ║ 170 ║ PDC + GDC ║ ║ 119 ║ GDC only ║ ║ 82 ║ IDC only ║ ║ 6 ║ GDC + IDC ║ ║ 2 ║ PDC + GC + IDC ║ ║ 1 ║ PDC + GC ║ ║ 1 ║ PDC + GC + GDC ║ ╚════════════╩══════════════════════╝ ╔════════════════╦══════════════════════════╗ ║ count_result ║ cause_of_death ║ ╠════════════════╬══════════════════════════╣ ║ 1026 ║ <NA> ║ ║ 106 ║ Cancer-Related Death ║ ║ 18 ║ Non-Cancer Related Death ║ ║ 6 ║ Infection ║ ║ 5 ║ Cardiovascular Disorder ║ ║ 3 ║ Surgical Complication ║ ╚════════════════╩══════════════════════════╝ ╔════════════════╦══════════════════════════════════╗ ║ count_result ║ race ║ ╠════════════════╬══════════════════════════════════╣ ║ 802 ║ White ║ ║ 199 ║ <NA> ║ ║ 140 ║ Asian ║ ║ 22 ║ Black or African American ║ ║ 1 ║ American Indian or Alaska Native ║ ╚════════════════╩══════════════════════════════════╝ ╔════════════════╦═══════════╗ ║ count_result ║ species ║ ╠════════════════╬═══════════╣ ║ 1082 ║ human ║ ║ 82 ║ mouse ║ ╚════════════════╩═══════════╝ ╔════════════════╦════════════════════╗ ║ count_result ║ ethnicity ║ ╠════════════════╬════════════════════╣ ║ 794 ║ <NA> ║ ║ 339 ║ Non-Hispanic ║ ║ 31 ║ Hispanic or Latino ║ ╚════════════════╩════════════════════╝ ╔════════════════╦════════════════════════════════════════════╗ ║ count_result ║ diagnosis ║ ╠════════════════╬════════════════════════════════════════════╣ ║ 600 ║ Adenocarcinoma ║ ║ 458 ║ Neoplasm, malignant ║ ║ 245 ║ Endometrioid adenocarcinoma ║ ║ 241 ║ Clear cell adenocarcinoma ║ ║ 222 ║ Renal cell carcinoma ║ ║ 85 ║ Adenocarcinoma, intestinal type ║ ║ 33 ║ Adenoma ║ ║ 23 ║ Epithelial tumor, benign ║ ║ 21 ║ Papillary adenocarcinoma ║ ║ 14 ║ Adenocarcinoma with mixed subtypes ║ ║ 14 ║ Mucinous adenocarcinoma ║ ║ 11 ║ Adenocarcinoma, metastatic ║ ║ 3 ║ Cystic, mucinous and serous neoplasms ║ ║ 3 ║ Oxyphilic adenoma ║ ║ 1 ║ Adenosquamous carcinoma ║ ║ 1 ║ Angioimmunoblastic T-cell lymphoma ║ ║ 1 ║ Complex epithelial neoplasms ║ ║ 1 ║ Hepatoid adenocarcinoma ║ ║ 1 ║ Malignant melanoma ║ ║ 1 ║ Neoplasm, metastatic ║ ║ 1 ║ Neuroendocrine carcinoma ║ ║ 1 ║ Renal cell carcinoma, chromophobe type ║ ║ 1 ║ Squamous cell carcinoma ║ ║ 1 ║ T-cell large granular lymphocytic leukemia ║ ║ 1 ║ Tubular adenocarcinoma ║ ╚════════════════╩════════════════════════════════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_birth ║ ╠════════════════╬═════════════════╣ ║ mean ║ 1957 ║ ║ min ║ 1914 ║ ║ lower quartile ║ 1945 ║ ║ median ║ 1953 ║ ║ upper quartile ║ 1962 ║ ║ max ║ 2021 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_death ║ ╠════════════════╬═════════════════╣ ║ mean ║ 2016 ║ ║ min ║ 2002 ║ ║ lower quartile ║ 2016 ║ ║ median ║ 2018 ║ ║ upper quartile ║ 2020 ║ ║ max ║ 2023 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦══════════════════════╗ ║ ║ age_at_observation ║ ╠════════════════╬══════════════════════╣ ║ mean ║ 61 ║ ║ min ║ 0 ║ ║ lower quartile ║ 54 ║ ║ median ║ 63 ║ ║ upper quartile ║ 71 ║ ║ max ║ 90 ║ ╚════════════════╩══════════════════════╝
There are just under 950 subjects that fit those criteria, but it looks like some are mice. I don't want mice, so I'm going to add a species filter:
summarize_subjects(match_all=["diagnosis = *adenocarcinoma*", "age_at_observation != NULL", "species = human"])
╔═══════════════════════════════╗ ║ number_of_matching_subjects ║ ╠═══════════════════════════════╣ ║ 1082 ║ ╚═══════════════════════════════╝ ╔════════════════════════════════════════════════╗ ║ number_of_files_related_to_matching_subjects ║ ╠════════════════════════════════════════════════╣ ║ 130749 ║ ╚════════════════════════════════════════════════╝ ╔════════════╦══════════════════════╗ ║ subjects ║ data_source ║ ╠════════════╬══════════════════════╣ ║ 425 ║ IDC + PDC + GDC ║ ║ 358 ║ IDC + GC + PDC + GDC ║ ║ 170 ║ PDC + GDC ║ ║ 119 ║ GDC only ║ ║ 6 ║ IDC + GDC ║ ║ 2 ║ IDC + GC + PDC ║ ║ 1 ║ GC + PDC ║ ║ 1 ║ GC + PDC + GDC ║ ╚════════════╩══════════════════════╝ ╔════════════════╦══════════════════════════╗ ║ count_result ║ cause_of_death ║ ╠════════════════╬══════════════════════════╣ ║ 944 ║ <NA> ║ ║ 106 ║ Cancer-Related Death ║ ║ 18 ║ Non-Cancer Related Death ║ ║ 6 ║ Infection ║ ║ 5 ║ Cardiovascular Disorder ║ ║ 3 ║ Surgical Complication ║ ╚════════════════╩══════════════════════════╝ ╔════════════════╦════════════════════╗ ║ count_result ║ ethnicity ║ ╠════════════════╬════════════════════╣ ║ 712 ║ <NA> ║ ║ 339 ║ Non-Hispanic ║ ║ 31 ║ Hispanic or Latino ║ ╚════════════════╩════════════════════╝ ╔════════════════╦═══════════╗ ║ count_result ║ species ║ ╠════════════════╬═══════════╣ ║ 1082 ║ human ║ ╚════════════════╩═══════════╝ ╔════════════════╦══════════════════════════════════╗ ║ count_result ║ race ║ ╠════════════════╬══════════════════════════════════╣ ║ 802 ║ White ║ ║ 140 ║ Asian ║ ║ 117 ║ <NA> ║ ║ 22 ║ Black or African American ║ ║ 1 ║ American Indian or Alaska Native ║ ╚════════════════╩══════════════════════════════════╝ ╔════════════════╦════════════════════════════════════════════╗ ║ count_result ║ diagnosis ║ ╠════════════════╬════════════════════════════════════════════╣ ║ 542 ║ Adenocarcinoma ║ ║ 458 ║ Neoplasm, malignant ║ ║ 245 ║ Endometrioid adenocarcinoma ║ ║ 241 ║ Clear cell adenocarcinoma ║ ║ 222 ║ Renal cell carcinoma ║ ║ 61 ║ Adenocarcinoma, intestinal type ║ ║ 33 ║ Adenoma ║ ║ 23 ║ Epithelial tumor, benign ║ ║ 21 ║ Papillary adenocarcinoma ║ ║ 14 ║ Adenocarcinoma with mixed subtypes ║ ║ 14 ║ Mucinous adenocarcinoma ║ ║ 11 ║ Adenocarcinoma, metastatic ║ ║ 3 ║ Cystic, mucinous and serous neoplasms ║ ║ 3 ║ Oxyphilic adenoma ║ ║ 1 ║ Adenosquamous carcinoma ║ ║ 1 ║ Angioimmunoblastic T-cell lymphoma ║ ║ 1 ║ Complex epithelial neoplasms ║ ║ 1 ║ Hepatoid adenocarcinoma ║ ║ 1 ║ Malignant melanoma ║ ║ 1 ║ Neoplasm, metastatic ║ ║ 1 ║ Neuroendocrine carcinoma ║ ║ 1 ║ Renal cell carcinoma, chromophobe type ║ ║ 1 ║ Squamous cell carcinoma ║ ║ 1 ║ T-cell large granular lymphocytic leukemia ║ ║ 1 ║ Tubular adenocarcinoma ║ ╚════════════════╩════════════════════════════════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_death ║ ╠════════════════╬═════════════════╣ ║ mean ║ 2016 ║ ║ min ║ 2002 ║ ║ lower quartile ║ 2016 ║ ║ median ║ 2018 ║ ║ upper quartile ║ 2020 ║ ║ max ║ 2023 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_birth ║ ╠════════════════╬═════════════════╣ ║ mean ║ 1952 ║ ║ min ║ 1914 ║ ║ lower quartile ║ 1945 ║ ║ median ║ 1952 ║ ║ upper quartile ║ 1959 ║ ║ max ║ 1992 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦══════════════════════╗ ║ ║ age_at_observation ║ ╠════════════════╬══════════════════════╣ ║ mean ║ 63 ║ ║ min ║ 15 ║ ║ lower quartile ║ 55 ║ ║ median ║ 64 ║ ║ upper quartile ║ 71 ║ ║ max ║ 90 ║ ╚════════════════╩══════════════════════╝
Just over 900 humans meet that criteria, that seems promising. I may also want to look at some age ranges, as summaries of the subject data:
summarize_subjects(match_all=["age_at_observation > 80", "diagnosis = *adenocarcinoma*", "species = human"])
╔═══════════════════════════════╗ ║ number_of_matching_subjects ║ ╠═══════════════════════════════╣ ║ 99 ║ ╚═══════════════════════════════╝ ╔════════════════════════════════════════════════╗ ║ number_of_files_related_to_matching_subjects ║ ╠════════════════════════════════════════════════╣ ║ 18037 ║ ╚════════════════════════════════════════════════╝ ╔════════════╦══════════════════════╗ ║ subjects ║ data_source ║ ╠════════════╬══════════════════════╣ ║ 51 ║ GDC + PDC + IDC ║ ║ 33 ║ GDC + PDC + GC + IDC ║ ║ 15 ║ GDC + PDC ║ ╚════════════╩══════════════════════╝ ╔════════════════╦═══════════════════════════╗ ║ count_result ║ race ║ ╠════════════════╬═══════════════════════════╣ ║ 56 ║ <NA> ║ ║ 42 ║ White ║ ║ 1 ║ Black or African American ║ ╚════════════════╩═══════════════════════════╝ ╔════════════════╦══════════════════════════╗ ║ count_result ║ cause_of_death ║ ╠════════════════╬══════════════════════════╣ ║ 85 ║ <NA> ║ ║ 5 ║ Non-Cancer Related Death ║ ║ 4 ║ Cancer-Related Death ║ ║ 2 ║ Cardiovascular Disorder ║ ║ 2 ║ Surgical Complication ║ ║ 1 ║ Infection ║ ╚════════════════╩══════════════════════════╝ ╔════════════════╦═══════════╗ ║ count_result ║ species ║ ╠════════════════╬═══════════╣ ║ 99 ║ human ║ ╚════════════════╩═══════════╝ ╔════════════════╦══════════════╗ ║ count_result ║ ethnicity ║ ╠════════════════╬══════════════╣ ║ 87 ║ <NA> ║ ║ 12 ║ Non-Hispanic ║ ╚════════════════╩══════════════╝ ╔════════════════╦═════════════════════════════════╗ ║ count_result ║ diagnosis ║ ╠════════════════╬═════════════════════════════════╣ ║ 68 ║ Adenocarcinoma ║ ║ 40 ║ Adenocarcinoma, intestinal type ║ ║ 25 ║ Neoplasm, malignant ║ ║ 18 ║ Adenoma ║ ║ 16 ║ Endometrioid adenocarcinoma ║ ║ 11 ║ Clear cell adenocarcinoma ║ ║ 10 ║ Renal cell carcinoma ║ ║ 5 ║ Mucinous adenocarcinoma ║ ║ 1 ║ Adenosquamous carcinoma ║ ║ 1 ║ Complex epithelial neoplasms ║ ║ 1 ║ Papillary adenocarcinoma ║ ╚════════════════╩═════════════════════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_death ║ ╠════════════════╬═════════════════╣ ║ mean ║ 2018 ║ ║ min ║ 2007 ║ ║ lower quartile ║ 2017 ║ ║ median ║ 2020 ║ ║ upper quartile ║ 2020 ║ ║ max ║ 2022 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_birth ║ ╠════════════════╬═════════════════╣ ║ mean ║ 1932 ║ ║ min ║ 1914 ║ ║ lower quartile ║ 1929 ║ ║ median ║ 1933 ║ ║ upper quartile ║ 1937 ║ ║ max ║ 1942 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦══════════════════════╗ ║ ║ age_at_observation ║ ╠════════════════╬══════════════════════╣ ║ mean ║ 80 ║ ║ min ║ 63 ║ ║ lower quartile ║ 75 ║ ║ median ║ 81 ║ ║ upper quartile ║ 86 ║ ║ max ║ 90 ║ ╚════════════════╩══════════════════════╝
summarize_subjects(match_all=[ "70 < age_at_observation <= 80", "diagnosis = *adenocarcinoma*", "species = human"])
╔═══════════════════════════════╗ ║ number_of_matching_subjects ║ ╠═══════════════════════════════╣ ║ 266 ║ ╚═══════════════════════════════╝ ╔════════════════════════════════════════════════╗ ║ number_of_files_related_to_matching_subjects ║ ╠════════════════════════════════════════════════╣ ║ 44687 ║ ╚════════════════════════════════════════════════╝ ╔════════════╦══════════════════════╗ ║ subjects ║ data_source ║ ╠════════════╬══════════════════════╣ ║ 103 ║ IDC + PDC + GDC ║ ║ 83 ║ IDC + GC + PDC + GDC ║ ║ 53 ║ PDC + GDC ║ ║ 24 ║ GDC only ║ ║ 1 ║ GC + PDC ║ ║ 1 ║ IDC + GC + PDC ║ ║ 1 ║ GC + PDC + GDC ║ ╚════════════╩══════════════════════╝ ╔════════════════╦════════════════════╗ ║ count_result ║ ethnicity ║ ╠════════════════╬════════════════════╣ ║ 176 ║ <NA> ║ ║ 88 ║ Non-Hispanic ║ ║ 2 ║ Hispanic or Latino ║ ╚════════════════╩════════════════════╝ ╔════════════════╦══════════════════════════╗ ║ count_result ║ cause_of_death ║ ╠════════════════╬══════════════════════════╣ ║ 226 ║ <NA> ║ ║ 26 ║ Cancer-Related Death ║ ║ 7 ║ Non-Cancer Related Death ║ ║ 3 ║ Infection ║ ║ 2 ║ Cardiovascular Disorder ║ ║ 2 ║ Surgical Complication ║ ╚════════════════╩══════════════════════════╝ ╔════════════════╦═══════════╗ ║ count_result ║ species ║ ╠════════════════╬═══════════╣ ║ 266 ║ human ║ ╚════════════════╩═══════════╝ ╔════════════════╦═══════════════════════════╗ ║ count_result ║ race ║ ╠════════════════╬═══════════════════════════╣ ║ 193 ║ White ║ ║ 51 ║ <NA> ║ ║ 15 ║ Asian ║ ║ 7 ║ Black or African American ║ ╚════════════════╩═══════════════════════════╝ ╔════════════════╦═══════════════════════════════════════╗ ║ count_result ║ diagnosis ║ ╠════════════════╬═══════════════════════════════════════╣ ║ 168 ║ Adenocarcinoma ║ ║ 96 ║ Neoplasm, malignant ║ ║ 56 ║ Endometrioid adenocarcinoma ║ ║ 35 ║ Adenocarcinoma, intestinal type ║ ║ 34 ║ Clear cell adenocarcinoma ║ ║ 30 ║ Renal cell carcinoma ║ ║ 15 ║ Adenoma ║ ║ 5 ║ Mucinous adenocarcinoma ║ ║ 4 ║ Epithelial tumor, benign ║ ║ 2 ║ Adenocarcinoma, metastatic ║ ║ 2 ║ Papillary adenocarcinoma ║ ║ 1 ║ Adenosquamous carcinoma ║ ║ 1 ║ Complex epithelial neoplasms ║ ║ 1 ║ Cystic, mucinous and serous neoplasms ║ ║ 1 ║ Malignant melanoma ║ ║ 1 ║ Neoplasm, metastatic ║ ╚════════════════╩═══════════════════════════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_birth ║ ╠════════════════╬═════════════════╣ ║ mean ║ 1941 ║ ║ min ║ 1923 ║ ║ lower quartile ║ 1938 ║ ║ median ║ 1943 ║ ║ upper quartile ║ 1945 ║ ║ max ║ 1952 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_death ║ ╠════════════════╬═════════════════╣ ║ mean ║ 2017 ║ ║ min ║ 2004 ║ ║ lower quartile ║ 2017 ║ ║ median ║ 2018 ║ ║ upper quartile ║ 2020 ║ ║ max ║ 2023 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦══════════════════════╗ ║ ║ age_at_observation ║ ╠════════════════╬══════════════════════╣ ║ mean ║ 73 ║ ║ min ║ 53 ║ ║ lower quartile ║ 71 ║ ║ median ║ 73 ║ ║ upper quartile ║ 77 ║ ║ max ║ 90 ║ ╚════════════════╩══════════════════════╝
summarize_subjects(match_all=["60 < age_at_observation <= 70", "diagnosis = *adenocarcinoma*", "species = human"])
╔═══════════════════════════════╗ ║ number_of_matching_subjects ║ ╠═══════════════════════════════╣ ║ 440 ║ ╚═══════════════════════════════╝ ╔════════════════════════════════════════════════╗ ║ number_of_files_related_to_matching_subjects ║ ╠════════════════════════════════════════════════╣ ║ 65413 ║ ╚════════════════════════════════════════════════╝ ╔════════════╦══════════════════════╗ ║ subjects ║ data_source ║ ╠════════════╬══════════════════════╣ ║ 191 ║ PDC + GDC + IDC ║ ║ 142 ║ PDC + GDC + IDC + GC ║ ║ 63 ║ PDC + GDC ║ ║ 40 ║ GDC only ║ ║ 4 ║ GDC + IDC ║ ╚════════════╩══════════════════════╝ ╔════════════════╦══════════════════════════╗ ║ count_result ║ cause_of_death ║ ╠════════════════╬══════════════════════════╣ ║ 380 ║ <NA> ║ ║ 46 ║ Cancer-Related Death ║ ║ 10 ║ Non-Cancer Related Death ║ ║ 2 ║ Cardiovascular Disorder ║ ║ 2 ║ Infection ║ ╚════════════════╩══════════════════════════╝ ╔════════════════╦════════════════════╗ ║ count_result ║ ethnicity ║ ╠════════════════╬════════════════════╣ ║ 300 ║ <NA> ║ ║ 131 ║ Non-Hispanic ║ ║ 9 ║ Hispanic or Latino ║ ╚════════════════╩════════════════════╝ ╔════════════════╦══════════════════════════════════╗ ║ count_result ║ race ║ ╠════════════════╬══════════════════════════════════╣ ║ 341 ║ White ║ ║ 49 ║ Asian ║ ║ 43 ║ <NA> ║ ║ 6 ║ Black or African American ║ ║ 1 ║ American Indian or Alaska Native ║ ╚════════════════╩══════════════════════════════════╝ ╔════════════════╦═══════════╗ ║ count_result ║ species ║ ╠════════════════╬═══════════╣ ║ 440 ║ human ║ ╚════════════════╩═══════════╝ ╔════════════════╦════════════════════════════════════════════╗ ║ count_result ║ diagnosis ║ ╠════════════════╬════════════════════════════════════════════╣ ║ 209 ║ Adenocarcinoma ║ ║ 196 ║ Neoplasm, malignant ║ ║ 114 ║ Endometrioid adenocarcinoma ║ ║ 95 ║ Clear cell adenocarcinoma ║ ║ 85 ║ Renal cell carcinoma ║ ║ 21 ║ Adenocarcinoma, intestinal type ║ ║ 11 ║ Adenoma ║ ║ 10 ║ Epithelial tumor, benign ║ ║ 9 ║ Adenocarcinoma with mixed subtypes ║ ║ 6 ║ Mucinous adenocarcinoma ║ ║ 5 ║ Adenocarcinoma, metastatic ║ ║ 5 ║ Papillary adenocarcinoma ║ ║ 3 ║ Cystic, mucinous and serous neoplasms ║ ║ 2 ║ Oxyphilic adenoma ║ ║ 1 ║ Angioimmunoblastic T-cell lymphoma ║ ║ 1 ║ Malignant melanoma ║ ║ 1 ║ Neoplasm, metastatic ║ ║ 1 ║ Neuroendocrine carcinoma ║ ║ 1 ║ Squamous cell carcinoma ║ ║ 1 ║ T-cell large granular lymphocytic leukemia ║ ║ 1 ║ Tubular adenocarcinoma ║ ╚════════════════╩════════════════════════════════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_death ║ ╠════════════════╬═════════════════╣ ║ mean ║ 2017 ║ ║ min ║ 2003 ║ ║ lower quartile ║ 2017 ║ ║ median ║ 2019 ║ ║ upper quartile ║ 2020 ║ ║ max ║ 2023 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦═════════════════╗ ║ ║ year_of_birth ║ ╠════════════════╬═════════════════╣ ║ mean ║ 1950 ║ ║ min ║ 1931 ║ ║ lower quartile ║ 1948 ║ ║ median ║ 1951 ║ ║ upper quartile ║ 1954 ║ ║ max ║ 1962 ║ ╚════════════════╩═════════════════╝ ╔════════════════╦══════════════════════╗ ║ ║ age_at_observation ║ ╠════════════════╬══════════════════════╣ ║ mean ║ 65 ║ ║ min ║ 43 ║ ║ lower quartile ║ 62 ║ ║ median ║ 65 ║ ║ upper quartile ║ 68 ║ ║ max ║ 88 ║ ╚════════════════╩══════════════════════╝
I'm going to look more closely at one of my age ranges by running get_subject_data instead of summarize_subjects:
get_subject_data(match_all=["60 < age_at_observation <= 70", "diagnosis = *adenocarcinoma*", "species = human"])
Loading ITables v2.7.3 from the init_notebook_mode cell...
(need help?)
|
Several of the subjects here have multiple diagnoses and data from multiple sources, but not multiple age_at_observation values. I wonder why that is, so I'm going to have cdapython collate_results, that is, I'm going to have it match up all these observation data points to one another so I can see what the original data looked like:
sixty2seventy_collated = get_subject_data(match_all=["60 < age_at_observation <= 70", "diagnosis = *adenocarcinoma*", "species = human"], collate_results=True)
sixty2seventy_collated
Loading ITables v2.7.3 from the init_notebook_mode cell...
(need help?)
|
Now each row has an embedded dataframe of all the relevent observation info, lets look at the first row. Here I'm asking for the 'observation_data' column, and the first (zeroth) row:
sixty2seventy_collated['observation_data'][0]
Loading ITables v2.7.3 from the init_notebook_mode cell...
(need help?)
|
It looks like in the original source data, only some of the records included age information. I can look at even more detailed information if I add more columns to my search results table. Here, I'm adding the entire observation table by using asterisks again observation.* means 'anything that is inside the databases observation table'
all_obs_sixty2seventy_collated = get_subject_data(match_all=["60 < age_at_observation <= 70", "diagnosis = *adenocarcinoma*", "species = human"], add_columns='observation.*', collate_results=True)
all_obs_sixty2seventy_collated['observation_data'][0]
Loading ITables v2.7.3 from the init_notebook_mode cell...
(need help?)
|
This is very helpful output for assessing whether individual subjects that came back in my query have the types of data I need, and shows me where each bit of aggregated data came from, but it wouldn't be very effecient to look at each of these subjects one by one. If I want to look at this more detailed information for all the subjects, I can run another function on my results that expands