Skip to content

Instantly share code, notes, and snippets.

@david-andrew
Created January 16, 2025 15:23
Show Gist options
  • Save david-andrew/e24dcd431bfd637bfd166e06f8772c22 to your computer and use it in GitHub Desktop.
Save david-andrew/e24dcd431bfd637bfd166e06f8772c22 to your computer and use it in GitHub Desktop.
simple test of graph rag over cbioportal
$ python -m graphrag.graphrag
Successfully loaded graph @ /home/david/dev/adhoc-api/graphrag/cbioportal from disk.
>>> can you run code to get gene panel data?
thought: I need more information about the specific gene panel data the user is interested in
to proceed effectively.
tool: ask_user
tool_input: {'query': 'Could you please provide more details about the gene panel data you are
interested in? For example, are you looking for specific genes, a particular type of analysis,
or data from a certain database or source?'}
Could you please provide more details about the gene panel data you are interested in? For example, are you looking for specific genes, a particular type of analysis, or data from a certain database or source? $ I'm just doing early exploration, I don't know what I want
observation: I'm just doing early exploration, I don't know what I want
thought: I need to understand what gene panel data is and how it can be accessed
programmatically to assist the user in obtaining it.
tool: ask_api
tool_input: {'query': 'What is gene panel data and how can it be accessed programmatically?'}
observation: Gene panel data refers to structured information about collections of genes within
specific panels, often used in molecular profiling contexts. This data can include identifiers
for molecular profiles, patients, samples, and studies, as well as flags indicating whether the
gene panel has been profiled.
To access gene panel data programmatically, the API provides several endpoints and methods:
1. **Retrieve Specific Gene Panel Information:**
- **Endpoint:** `/api/gene-panels/{genePanelId}`
- **Method:** GET
- **Description:** This endpoint retrieves information about a specific gene panel
identified by `genePanelId`. The response includes details structured according to the
`GenePanel` schema.
2. **Fetch Multiple Gene Panels:**
- **Endpoint:** `/api/gene-panels/fetch`
- **Method:** POST
- **Description:** This endpoint allows retrieval of information about multiple gene panels
by providing an array of gene panel IDs in the request body. The response returns a JSON array
of gene panels.
3. **Retrieve Gene Panel Data for Specific Molecular Profiles:**
- **Endpoint:** `/api/molecular-profiles/{molecularProfileId}/gene-panel-data/fetch`
- **Method:** POST
- **Description:** This endpoint fetches gene panel data associated with a specific
molecular profile. The request requires a `molecularProfileId` and a JSON payload conforming to
the `GenePanelDataFilter` schema. The response includes data structured according to the
`GenePanelData` schema.
4. **Fetch Gene Panel Data Across Multiple Molecular Profiles:**
- **Endpoint:** `/api/gene-panel-data/fetch`
- **Method:** POST
- **Description:** This endpoint is used to retrieve gene panel data across multiple
molecular profiles. The request body must adhere to the `GenePanelDataMultipleStudyFilter`
schema, specifying criteria for selecting the data. The response is an array of objects
conforming to the `GenePanelData` schema.
These endpoints provide flexible options for accessing and analyzing gene panel data,
supporting both specific and broad queries across multiple studies and profiles.
thought: Calling tool 'use_api'
tool: use_api
tool_input: {'goal': 'Retrieve information about available gene panels using the API.'}
observation: import requests
# Base URL for the API
base_url = "https://www.cbioportal.org"
# Function to retrieve a list of gene panels
def get_all_gene_panels(projection="SUMMARY", page_size=10000000, page_number=0,
sort_by="genePanelId", direction="ASC"):
endpoint = f"{base_url}/api/gene-panels"
params = {
"projection": projection,
"pageSize": page_size,
"pageNumber": page_number,
"sortBy": sort_by,
"direction": direction
}
response = requests.get(endpoint, params=params)
if response.status_code == 200:
return response.json()
else:
response.raise_for_status()
# Function to retrieve information about a specific gene panel
def get_gene_panel(gene_panel_id):
endpoint = f"{base_url}/api/gene-panels/{gene_panel_id}"
response = requests.get(endpoint)
if response.status_code == 200:
return response.json()
else:
response.raise_for_status()
# Function to fetch gene panels using POST request
def fetch_gene_panels(gene_panel_ids, projection="SUMMARY"):
endpoint = f"{base_url}/api/gene-panels/fetch"
headers = {"Content-Type": "application/json"}
data = {
"projection": projection,
"genePanelIds": gene_panel_ids
}
response = requests.post(endpoint, json=data, headers=headers)
if response.status_code == 200:
return response.json()
else:
response.raise_for_status()
# Example usage
try:
# Retrieve all gene panels
all_gene_panels = get_all_gene_panels()
print("All Gene Panels:", all_gene_panels)
# Retrieve a specific gene panel
specific_gene_panel = get_gene_panel("NSCLC_UNITO_2016_PANEL")
print("Specific Gene Panel:", specific_gene_panel)
# Fetch gene panels using POST
fetched_gene_panels = fetch_gene_panels(["NSCLC_UNITO_2016_PANEL"])
print("Fetched Gene Panels:", fetched_gene_panels)
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
thought: Calling tool 'run'
tool: run
tool_input: {'code': 'import requests\n\n# Base URL for the API\nbase_url =
"https://www.cbioportal.org"\n\n# Function to retrieve a list of gene panels\ndef
get_all_gene_panels(projection="SUMMARY", page_size=10000000, page_number=0,
sort_by="genePanelId", direction="ASC"):\n endpoint = f"{base_url}/api/gene-panels"\n
params = {\n "projection": projection,\n "pageSize": page_size,\n
"pageNumber": page_number,\n "sortBy": sort_by,\n "direction": direction\n }\n
response = requests.get(endpoint, params=params)\n if response.status_code == 200:\n
return response.json()\n else:\n response.raise_for_status()\n\n# Function to
retrieve information about a specific gene panel\ndef get_gene_panel(gene_panel_id):\n
endpoint = f"{base_url}/api/gene-panels/{gene_panel_id}"\n response =
requests.get(endpoint)\n if response.status_code == 200:\n return response.json()\n
else:\n response.raise_for_status()\n\n# Function to fetch gene panels using POST
request\ndef fetch_gene_panels(gene_panel_ids, projection="SUMMARY"):\n endpoint =
f"{base_url}/api/gene-panels/fetch"\n headers = {"Content-Type": "application/json"}\n
data = {\n "projection": projection,\n "genePanelIds": gene_panel_ids\n }\n
response = requests.post(endpoint, json=data, headers=headers)\n if response.status_code ==
200:\n return response.json()\n else:\n response.raise_for_status()\n\n#
Example usage\ntry:\n # Retrieve all gene panels\n all_gene_panels =
get_all_gene_panels()\n print("All Gene Panels:", all_gene_panels)\n\n # Retrieve a
specific gene panel\n specific_gene_panel = get_gene_panel("NSCLC_UNITO_2016_PANEL")\n
print("Specific Gene Panel:", specific_gene_panel)\n\n # Fetch gene panels using POST\n
fetched_gene_panels = fetch_gene_panels(["NSCLC_UNITO_2016_PANEL"])\n print("Fetched Gene
Panels:", fetched_gene_panels)\n\nexcept requests.exceptions.RequestException as e:\n
print(f"An error occurred: {e}")'}
observation: All Gene Panels: [{'description': 'Targeted (27 cancer genes) sequencing of
adenoid cystic carcinomas on Illumina HiSeq sequencers.', 'genePanelId': 'ACYC_FMI_27'},
{'description': 'Targeted panel of 232 genes.', 'genePanelId': 'Agilent'}, {'description':
'Targeted panel of 8 genes.', 'genePanelId': 'AmpliSeq'}, {'description': 'ARCHER-HEME gene
panel (199 genes)', 'genePanelId': 'ARCHER-HEME-CV1'}, {'description': 'ARCHER-SOLID Gene Panel
(62 genes)', 'genePanelId': 'ARCHER-SOLID-CV1'}, {'description': 'Targeted sequencing of
various tumor types via bait v3.', 'genePanelId': 'bait_v3'}, {'description': 'Targeted
sequencing of various tumor types via bait v4.', 'genePanelId': 'bait_v4'}, {'description':
'Targeted sequencing of various tumor types via bait v5.', 'genePanelId': 'bait_v5'},
{'description': 'Targeted panel of 387 cancer-related genes.', 'genePanelId':
'bcc_unige_2016_cancer_panel'}, {'description': 'Research (CMO) IMPACT-Heme gene panel version
3.', 'genePanelId': 'HemePACT_v3'}, {'description': 'Targeted sequencing of 18 genes via
Resolution ctDx Lung assay.', 'genePanelId': 'ctDx_lung_panel'}, {'description': 'Targeted
sequencing of 503 cancer-associated genes on Illumina HiSeq sequencers.', 'genePanelId':
'DFCI_504'}, {'description': 'FoundationOne Heme gene panel targets the entire coding sequence
of 406 genes, selected introns of 31 genes involved in rearrangements and utilizes RNA
sequencing to interrogate 265 genes', 'genePanelId': 'FMI-D2'}, {'description': 'FoundationOne
Heme gene panel targets the entire coding sequence of 406 genes, selected introns of 31 genes
involved in rearrangements and utilizes RNA sequencing to interrogate 265 genes',
'genePanelId': 'FMI-R2'}, {'description': 'Foundation Medicine T4b gene panel (214 genes)',
'genePanelId': 'FMI-T4b'}, {'description': 'Foundation Medicine T5a gene panel (323 genes)',
'genePanelId': 'FMI-T5a'}, {'description': 'Foundation Medicine T7 gene panel (429 genes)',
'genePanelId': 'FMI-T7'}, {'description': 'Targeted panel of 600 genes.', 'genePanelId':
'gbm_tcga_pub_cancer_panel'}, {'description': 'DFCI-ONCOPANEL-1 (304 genes)', 'genePanelId':
'glioma_dfci_2020_DFCI-ONCOPANEL-1'}, {'description': 'DFCI-ONCOPANEL-2 (326 genes)',
'genePanelId': 'glioma_dfci_2020_DFCI-ONCOPANEL-2'}, {'description': 'DFCI-ONCOPANEL-3 (447
genes)', 'genePanelId': 'glioma_dfci_2020_DFCI-ONCOPANEL-3'}, {'description': 'Foundation
Medicine T5 gene panel (326 genes)', 'genePanelId': 'glioma_mskcc_2019_fmi_t5'},
{'description': 'Foundation Medicine T7 gene panel (434 genes)', 'genePanelId':
'glioma_mskcc_2019_fmi_t7'}, {'description': 'High-intensity targeted sequencing assay by GRAIL
covering 508 genes.', 'genePanelId': 'grail_cfdna_508'}, {'description': 'Targeted variant
analysis from 73 cancer genes by Next-Generation (NGS)/Massively parallel sequencing (MPS).',
'genePanelId': 'GUARDANT_73'}, {'description': 'Targeted (400 cancer genes) sequencing of
various tumor types via MSK-IMPACT Heme on Illumina HiSeq sequencers.', 'genePanelId':
'IMPACT-HEME-400'}, {'description': 'Targeted (467 cancer genes) sequencing of various tumor
types via MSK-IMPACT Heme on Illumina HiSeq sequencers. Originally 468 genes with CDKN2Ap14ARF
and CDKN2Ap16INK4A, merged into CDKN2A.', 'genePanelId': 'IMPACT-HEME-468'}, {'description':
'Hotspot sequencing panel for MSK-IMPACT Heme on Illumina HiSeq sequencers (204 genes).',
'genePanelId': 'IMPACT-HEME-HS'}, {'description': 'Targeted sequencing of urcc tumor via
MSK-IMPACT.', 'genePanelId': 'IMPACT230'}, {'description': 'Targeted (279 cancer genes)
sequencing of various tumor types via MSK-IMPACT on Illumina HiSeq sequencers.', 'genePanelId':
'IMPACT279'}, {'description': 'Targeted (300 cancer genes) sequencing of bladder urothelial
carcinomas via MSK-IMPACT on Illumina HiSeq sequencers.', 'genePanelId': 'IMPACT300'},
{'description': 'Targeted (341 cancer genes) sequencing of various tumor types via MSK-IMPACT
on Illumina HiSeq sequencers.', 'genePanelId': 'IMPACT341'}, {'description': 'Targeted (410
cancer genes) sequencing of various tumor types via MSK-IMPACT on Illumina HiSeq sequencers.',
'genePanelId': 'IMPACT410'}, {'description': 'Targeted (468 cancer genes) sequencing of various
tumor types via MSK-IMPACT on Illumina HiSeq sequencers.', 'genePanelId': 'IMPACT468'},
{'description': 'Targeted (505 cancer genes) sequencing of various tumor types via MSK-IMPACT
on Illumina HiSeq sequencers.', 'genePanelId': 'IMPACT505'}, {'description': 'Targeted (152
genes) sequencing of MDS samples from the International Working Group (IWG) for the prognosis
of MDS', 'genePanelId': 'MDSIWG152'}, {'description': 'Targeted NGS of AML/MDs Samples.',
'genePanelId': 'Papaemmanuil_NEJM_2016_MDS_2013_panel'}, {'description': 'Targeted (173 cancer
genes) sequencing of breast cancers on Illumina HiSeq sequencers.', 'genePanelId':
'METABRIC_173'}, {'description': 'Targeted (129 genes) sequencing of cfDNA via MSK-ACCESS on
Illumina HiSeq sequencers.', 'genePanelId': 'ACCESS129'}, {'description': 'Targeted sequencing
of 15 oncogenes and tumor suppressor genes, selected for their potential as predictive
biomarkers of response to targeted agents by Sanger sequencing.', 'genePanelId':
'MSKCC_BLCA_15'}, {'description': 'Targeted (343 cancer genes, IMPACT341 plus MYB and NFIB)
sequencing of various tumor types via MSK-IMPACT on Illumina HiSeq sequencers.', 'genePanelId':
'MYB-IMPACT341'}, {'description': 'Targeted (412 cancer genes, IMPACT410 plus MYB and NFIB)
sequencing of various tumor types via MSK-IMPACT on Illumina HiSeq sequencers.', 'genePanelId':
'MYB-IMPACT410'}, {'description': 'Targeted (470 cancer genes, IMPACT468 plus MYB and NFIB)
sequencing of various tumor types via MSK-IMPACT on Illumina HiSeq sequencers.', 'genePanelId':
'MYB-IMPACT468'}, {'description': 'Targeted NGS of NSCLC Samples.', 'genePanelId':
'NSCLC_UNITO_2016_PANEL'}, {'description': 'Targeted panel of 137 genes.', 'genePanelId':
'prad_mskcc_sanger'}, {'description': 'Targeted panel of 145 genes. (merge of prad_mskcc_sanger
and prad_mskcc_sequenom)', 'genePanelId': 'prad_mskcc_sequenom_sanger'}, {'description':
'Targeted panel of 21 genes.', 'genePanelId': 'prad_mskcc_sequenom'}, {'description': 'Targeted
panel for cfDNA sequencing (2 genes)', 'genePanelId': 'rbl_cfdna_msk_2020_panel'},
{'description': 'Targeted sequencing of clinical cases via Thunderbolts NGS Target Enrichment
System, RainDance Technologies.', 'genePanelId': 'RDTB49'}, {'description': 'Targeted
sequencing of clinical cases via ThunderStorm NGS Target Enrichment System, RainDance
Technologies.', 'genePanelId': 'RDTS30'}, {'description': "A panel that's designed to target
all exons of 262 genes that are recurrently mutated in de novo MDS/AML (â\x80\x98RMG
reagentâ\x80\x99).", 'genePanelId': 'RMG'}, {'description': 'Targeted panel of 226
protein-coding genes.', 'genePanelId': 'sarc_mskcc_panel'}, {'description': 'Targeted deep
sequencing (TDS) of 1202 genes on Illumina HiSeq2000 sequencers.', 'genePanelId': 'UCLA_1202'},
{'description': 'DNA sequencing of 622 genes with known or potential relationships to cancer.',
'genePanelId': 'WUSTL-DFCI_623'}, {'description': 'Targeted massively parallel sequencing of
4047 cancer genes.', 'genePanelId': 'NOVARTIS_BROAD_1651'}, {'description': 'Foundation
Medicine T7 gene panel (430 cancer genes, FMI-T7 plus TERT)', 'genePanelId': 'TERT-FMI-T7'},
{'description': 'FoundationOne gene panel (317 genes)', 'genePanelId': 'FoundationOne'},
{'description': 'DFCI-ONCOPANEL-3.1, Number of Genes - 447', 'genePanelId':
'DFCI-ONCOPANEL-3.1'}, {'description': 'DFCI-ONCOPANEL-3, Number of Genes - 447',
'genePanelId': 'DFCI-ONCOPANEL-3'}, {'description': 'DFCI-ONCOPANEL-1, Number of Genes - 304',
'genePanelId': 'DFCI-ONCOPANEL-1'}, {'description': 'DFCI-ONCOPANEL-2, Number of Genes - 326',
'genePanelId': 'DFCI-ONCOPANEL-2'}, {'description': 'ARCHER-SOLID-CV4 Gene Panel (123 genes)',
'genePanelId': 'ARCHER-SOLID-CV4'}, {'description': 'Targeted panel of 262 genes.',
'genePanelId': 'MDAnderson_T200'}, {'description': 'Targeted panel of 476 cancer-related
genes.', 'genePanelId': 'sarcoma_msk_2022_panel'}]
Specific Gene Panel: {'description': 'Targeted NGS of NSCLC Samples.', 'genes':
[{'entrezGeneId': 25, 'hugoGeneSymbol': 'ABL1'}, {'entrezGeneId': 207, 'hugoGeneSymbol':
'AKT1'}, {'entrezGeneId': 324, 'hugoGeneSymbol': 'APC'}, {'entrezGeneId': 472,
'hugoGeneSymbol': 'ATM'}, {'entrezGeneId': 673, 'hugoGeneSymbol': 'BRAF'}, {'entrezGeneId':
999, 'hugoGeneSymbol': 'CDH1'}, {'entrezGeneId': 1029, 'hugoGeneSymbol': 'CDKN2A'},
{'entrezGeneId': 1956, 'hugoGeneSymbol': 'EGFR'}, {'entrezGeneId': 2064, 'hugoGeneSymbol':
'ERBB2'}, {'entrezGeneId': 2066, 'hugoGeneSymbol': 'ERBB4'}, {'entrezGeneId': 2263,
'hugoGeneSymbol': 'FGFR2'}, {'entrezGeneId': 2261, 'hugoGeneSymbol': 'FGFR3'}, {'entrezGeneId':
2322, 'hugoGeneSymbol': 'FLT3'}, {'entrezGeneId': 6927, 'hugoGeneSymbol': 'HNF1A'},
{'entrezGeneId': 3417, 'hugoGeneSymbol': 'IDH1'}, {'entrezGeneId': 3791, 'hugoGeneSymbol':
'KDR'}, {'entrezGeneId': 3815, 'hugoGeneSymbol': 'KIT'}, {'entrezGeneId': 3845,
'hugoGeneSymbol': 'KRAS'}, {'entrezGeneId': 4233, 'hugoGeneSymbol': 'MET'}, {'entrezGeneId':
4851, 'hugoGeneSymbol': 'NOTCH1'}, {'entrezGeneId': 5290, 'hugoGeneSymbol': 'PIK3CA'},
{'entrezGeneId': 5781, 'hugoGeneSymbol': 'PTPN11'}, {'entrezGeneId': 5925, 'hugoGeneSymbol':
'RB1'}, {'entrezGeneId': 5979, 'hugoGeneSymbol': 'RET'}, {'entrezGeneId': 4089,
'hugoGeneSymbol': 'SMAD4'}, {'entrezGeneId': 6608, 'hugoGeneSymbol': 'SMO'}, {'entrezGeneId':
6794, 'hugoGeneSymbol': 'STK11'}, {'entrezGeneId': 7157, 'hugoGeneSymbol': 'TP53'},
{'entrezGeneId': 7428, 'hugoGeneSymbol': 'VHL'}], 'genePanelId': 'NSCLC_UNITO_2016_PANEL'}
An error occurred: 400 Client Error: for url: https://www.cbioportal.org/api/gene-panels/fetch
I successfully retrieved information about available gene panels using the API. Here are some highlights:
1. **All Gene Panels:**
- There are numerous gene panels available, each with a unique ID and description. For example, the "ACYC_FMI_27" panel targets 27 cancer genes, and the "IMPACT505" panel targets 505 cancer genes.
2. **Specific Gene Panel:**
- I retrieved detailed information about the "NSCLC_UNITO_2016_PANEL" gene panel, which is used for targeted NGS of NSCLC samples. This panel includes genes such as ABL1, AKT1, APC, ATM, BRAF, and many others.
Unfortunately, there was an error when attempting to fetch gene panels using a POST request, which resulted in a 400 Client Error. This might be due to incorrect parameters or an issue with the API endpoint.
If you have any specific gene panels or further exploration in mind, feel free to let me know!
>>>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment