Command Quick Reference¶
Loading cdapython package¶
from cdapython import (
Q, columns, unique_terms, set_default_project_dataset, set_host_url)
import numpy as np
import pandas as pd
from itables import init_notebook_mode, show
init_notebook_mode(all_interactive=True)
import itables.options as opt
opt.maxBytes=0
opt.scrollX="200px"
opt.scrollCollapse=True
opt.paging=True
opt.maxColumns=0
print(Q.get_version())
2023.9.12
cdapython comes with a few basic packages that are useful for exploring data:
- pandas
- itables
- numpy
cdapython version¶
import cdapython
print(Q.get_version())
2023.9.12
columns()¶
Show all searchable columns in the CDA
Usage:
columns(<optional args>)
Options:
limit = <number>
: Changes the number of columns returned. Default = 100description = <True/False>
: Adds a description field for each column name. Default = True
Attributes:
.to_dataframe()
: Returns results as a dataframe.to_list()
: Returns results as a list
Returns: object containing column names
unique_terms()¶
For a given column show all unique terms.
Usage: unique_terms(<column_name>, <optional args>)
Options:
page_size = <number>
: Changes the number of rows returned per page. Default = 100limit = <number>
: Changes the number of rows returned in total. Default is null, which returns the full resultsystem= <data source>
: Returns only terms that belong to the specified datasource. CurrentlyGDC
,IDC
, orPDC
show_counts = <True/False>
: Returns terms as a dataframe that counts how often each term appears. Default = False
Attributes:
.to_dataframe()
: Returns results as a dataframe..to_list()
: Returns results as a list.
Returns: searchable terms for a given column
to_list()¶
Usage: columns().to_list(<optional args>)
Options:
filters = <string>
: Only returns data that contain<string>
exact = <True/False>
: Alters the behavior offilters
to only return data that are an exact match to<string>
Returns: results as a list
to_dataframe()¶
Usage: columns().to_dataframe(<optional args>)
Options:
search_fields = <array>
: Which column(s) should be searched for a given value, e.g. `["description", fieldName"]search_value = <string>
: The value to search for, e.g.search_value = "tumor"
Returns: results as a dataframe()
Q()¶
Executes this query on the public CDA server.
Usage: <result> = Q(<column> <operator> <term>)
Where:
<result>
is any user created variable<column>
is a value fromcolumns()
<operator>
is a valid comparison operator<term>
is a numeric/boolean/unique value fromunique_terms()
Returns: cda-python Q data type
run()¶
Retrieves results of a Q()
query.
Usage: <result>.run(<optional args>)
Where:
<result>
is a user created variable containing an executedQ()
query<result>
is any user created variable<column>
is a value fromcolumns()
<operator>
is a valid comparison operator<term>
is a numeric/boolean/unique value fromunique_terms()
Options:
include = '<column>, <column>'
: Only returns columns that contain<column>
(s). Can be used to return fewer columns from an endpoint, or to add columns that would not normally be returned to an endpoint result. In either case all desired columns must be listed.[1] Please note that returning many extra columns may result in improper joins.include = '<column>:<newname1>, <column>:<newname2>'
: Only returns columns that contain<column>
(s) and renames column(s) to<newname>
in outputpage_size = <number>
: Changes the number of rows returned per page. Default = 100limit = <number>
: Changes the number of rows returned in total. Default is null, which returns the full result
Returns: cda-python Q data type
[1] To add large numbers of columns to a result, it may be easier to get data from two endpoints and merge them, see an example in the Cohort Building workflow
join_as_str()¶
Returns a single column of data as a string
Usage: mycolumn = <result>.subject.run(filter='<column name>').join_as_str(key="<column name>", delimiter=",")
Where:
<result>
is a user created variable containing an executedQ()
query<column name>
is the column of desired information
The resulting string variable can be directly used as input to a new Q()
query. This example would retrieve all the files for the subjects that met the conditions of the query above:
Q(f'<column name> IN ({mycolumn})').file.run()
fromfile()¶
Takes a txt, csv, or tsv as input and returns a results object containing all matching data for the specified column of values
Usage: Q.from_file(field_to_search = "<column name>", file_to_search = "<file>", key = "<file column>")
Where:
<column name>
is the CDA column of desired information<file>
is the name of a file (including the path if necessary) that will be the input for search<file column>
is the header of the column in the users file that should be the specific input for search
For tsv and csv all variables are required. For txt files where there is only a single column of data, with one value per row, and no header, key
is optional.
count()¶
Retrieves summary counts of a Q()
query.
Usage: <result>.count()
Where:
<result>
is a user created variable containing an executedQ()
query
Valid Q()
comparison operators¶
Q
uses the following operators:,
=
: Equals!=
: Not EqualOR
AND
FROM
IN
andNOT IN
\%
: pattern matching a wildcardIS
andIS NOT
>
,>
,>=
,<=
: Greater and Less thanORDER_BY(<column>:<1 or -1>
: sort by a columnLIMIT
andOFFSET
Check out the Operators guide for more info
Print¶
Displays a pretty print version of any results variable.
Usage: print(<result>)
Where:
<result>
is a user created variable containing an executedQ()
query
Retreiving Data¶
By default, Q
only returns the first 100 rows of data for each search. This keeps search fast while still allowing you to preview the data. Once you have a query that returns the data you want, you'll use the get_all
function to return all the data as a dataframe or list.
Retreive to a dataframe¶
Usage:
To write to a dataframe
<result>.get_all().to_dataframe()
Where:
<result>
is a user created variable containing an executedQ()
query
Retreive to a list¶
Usage:
To write to a list
<result>.get_all().to_list()
Where:
<result>
is a user created variable containing an executedQ()
query
Manual retreival¶
To have more control over which pages are retreived, use the paginator()
function
Usage:
To write to a dataframe
import pandas as pd
myquery = Q(<your search parameters>)
mydataframe = pd.DataFrame() #create an emptydataframe
for i in myquery.subject.run().paginator(to_df=True): #loops through entire result, appending each page to your dataframe
mydataframe = pd.concat([mydataframe, i])
To write to a list
import pandas as pd
myquery = Q(<your search parameters>)
mylist = []
for i in myquery.subject.run().paginator(to_list=True):
mylist.extend(i)