Created
January 16, 2025 15:23
-
-
Save david-andrew/e24dcd431bfd637bfd166e06f8772c22 to your computer and use it in GitHub Desktop.
simple test of graph rag over cbioportal
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ python -m graphrag.graphrag | |
Successfully loaded graph @ /home/david/dev/adhoc-api/graphrag/cbioportal from disk. | |
>>> can you run code to get gene panel data? | |
thought: I need more information about the specific gene panel data the user is interested in | |
to proceed effectively. | |
tool: ask_user | |
tool_input: {'query': 'Could you please provide more details about the gene panel data you are | |
interested in? For example, are you looking for specific genes, a particular type of analysis, | |
or data from a certain database or source?'} | |
Could you please provide more details about the gene panel data you are interested in? For example, are you looking for specific genes, a particular type of analysis, or data from a certain database or source? $ I'm just doing early exploration, I don't know what I want | |
observation: I'm just doing early exploration, I don't know what I want | |
thought: I need to understand what gene panel data is and how it can be accessed | |
programmatically to assist the user in obtaining it. | |
tool: ask_api | |
tool_input: {'query': 'What is gene panel data and how can it be accessed programmatically?'} | |
observation: Gene panel data refers to structured information about collections of genes within | |
specific panels, often used in molecular profiling contexts. This data can include identifiers | |
for molecular profiles, patients, samples, and studies, as well as flags indicating whether the | |
gene panel has been profiled. | |
To access gene panel data programmatically, the API provides several endpoints and methods: | |
1. **Retrieve Specific Gene Panel Information:** | |
- **Endpoint:** `/api/gene-panels/{genePanelId}` | |
- **Method:** GET | |
- **Description:** This endpoint retrieves information about a specific gene panel | |
identified by `genePanelId`. The response includes details structured according to the | |
`GenePanel` schema. | |
2. **Fetch Multiple Gene Panels:** | |
- **Endpoint:** `/api/gene-panels/fetch` | |
- **Method:** POST | |
- **Description:** This endpoint allows retrieval of information about multiple gene panels | |
by providing an array of gene panel IDs in the request body. The response returns a JSON array | |
of gene panels. | |
3. **Retrieve Gene Panel Data for Specific Molecular Profiles:** | |
- **Endpoint:** `/api/molecular-profiles/{molecularProfileId}/gene-panel-data/fetch` | |
- **Method:** POST | |
- **Description:** This endpoint fetches gene panel data associated with a specific | |
molecular profile. The request requires a `molecularProfileId` and a JSON payload conforming to | |
the `GenePanelDataFilter` schema. The response includes data structured according to the | |
`GenePanelData` schema. | |
4. **Fetch Gene Panel Data Across Multiple Molecular Profiles:** | |
- **Endpoint:** `/api/gene-panel-data/fetch` | |
- **Method:** POST | |
- **Description:** This endpoint is used to retrieve gene panel data across multiple | |
molecular profiles. The request body must adhere to the `GenePanelDataMultipleStudyFilter` | |
schema, specifying criteria for selecting the data. The response is an array of objects | |
conforming to the `GenePanelData` schema. | |
These endpoints provide flexible options for accessing and analyzing gene panel data, | |
supporting both specific and broad queries across multiple studies and profiles. | |
thought: Calling tool 'use_api' | |
tool: use_api | |
tool_input: {'goal': 'Retrieve information about available gene panels using the API.'} | |
observation: import requests | |
# Base URL for the API | |
base_url = "https://www.cbioportal.org" | |
# Function to retrieve a list of gene panels | |
def get_all_gene_panels(projection="SUMMARY", page_size=10000000, page_number=0, | |
sort_by="genePanelId", direction="ASC"): | |
endpoint = f"{base_url}/api/gene-panels" | |
params = { | |
"projection": projection, | |
"pageSize": page_size, | |
"pageNumber": page_number, | |
"sortBy": sort_by, | |
"direction": direction | |
} | |
response = requests.get(endpoint, params=params) | |
if response.status_code == 200: | |
return response.json() | |
else: | |
response.raise_for_status() | |
# Function to retrieve information about a specific gene panel | |
def get_gene_panel(gene_panel_id): | |
endpoint = f"{base_url}/api/gene-panels/{gene_panel_id}" | |
response = requests.get(endpoint) | |
if response.status_code == 200: | |
return response.json() | |
else: | |
response.raise_for_status() | |
# Function to fetch gene panels using POST request | |
def fetch_gene_panels(gene_panel_ids, projection="SUMMARY"): | |
endpoint = f"{base_url}/api/gene-panels/fetch" | |
headers = {"Content-Type": "application/json"} | |
data = { | |
"projection": projection, | |
"genePanelIds": gene_panel_ids | |
} | |
response = requests.post(endpoint, json=data, headers=headers) | |
if response.status_code == 200: | |
return response.json() | |
else: | |
response.raise_for_status() | |
# Example usage | |
try: | |
# Retrieve all gene panels | |
all_gene_panels = get_all_gene_panels() | |
print("All Gene Panels:", all_gene_panels) | |
# Retrieve a specific gene panel | |
specific_gene_panel = get_gene_panel("NSCLC_UNITO_2016_PANEL") | |
print("Specific Gene Panel:", specific_gene_panel) | |
# Fetch gene panels using POST | |
fetched_gene_panels = fetch_gene_panels(["NSCLC_UNITO_2016_PANEL"]) | |
print("Fetched Gene Panels:", fetched_gene_panels) | |
except requests.exceptions.RequestException as e: | |
print(f"An error occurred: {e}") | |
thought: Calling tool 'run' | |
tool: run | |
tool_input: {'code': 'import requests\n\n# Base URL for the API\nbase_url = | |
"https://www.cbioportal.org"\n\n# Function to retrieve a list of gene panels\ndef | |
get_all_gene_panels(projection="SUMMARY", page_size=10000000, page_number=0, | |
sort_by="genePanelId", direction="ASC"):\n endpoint = f"{base_url}/api/gene-panels"\n | |
params = {\n "projection": projection,\n "pageSize": page_size,\n | |
"pageNumber": page_number,\n "sortBy": sort_by,\n "direction": direction\n }\n | |
response = requests.get(endpoint, params=params)\n if response.status_code == 200:\n | |
return response.json()\n else:\n response.raise_for_status()\n\n# Function to | |
retrieve information about a specific gene panel\ndef get_gene_panel(gene_panel_id):\n | |
endpoint = f"{base_url}/api/gene-panels/{gene_panel_id}"\n response = | |
requests.get(endpoint)\n if response.status_code == 200:\n return response.json()\n | |
else:\n response.raise_for_status()\n\n# Function to fetch gene panels using POST | |
request\ndef fetch_gene_panels(gene_panel_ids, projection="SUMMARY"):\n endpoint = | |
f"{base_url}/api/gene-panels/fetch"\n headers = {"Content-Type": "application/json"}\n | |
data = {\n "projection": projection,\n "genePanelIds": gene_panel_ids\n }\n | |
response = requests.post(endpoint, json=data, headers=headers)\n if response.status_code == | |
200:\n return response.json()\n else:\n response.raise_for_status()\n\n# | |
Example usage\ntry:\n # Retrieve all gene panels\n all_gene_panels = | |
get_all_gene_panels()\n print("All Gene Panels:", all_gene_panels)\n\n # Retrieve a | |
specific gene panel\n specific_gene_panel = get_gene_panel("NSCLC_UNITO_2016_PANEL")\n | |
print("Specific Gene Panel:", specific_gene_panel)\n\n # Fetch gene panels using POST\n | |
fetched_gene_panels = fetch_gene_panels(["NSCLC_UNITO_2016_PANEL"])\n print("Fetched Gene | |
Panels:", fetched_gene_panels)\n\nexcept requests.exceptions.RequestException as e:\n | |
print(f"An error occurred: {e}")'} | |
observation: All Gene Panels: [{'description': 'Targeted (27 cancer genes) sequencing of | |
adenoid cystic carcinomas on Illumina HiSeq sequencers.', 'genePanelId': 'ACYC_FMI_27'}, | |
{'description': 'Targeted panel of 232 genes.', 'genePanelId': 'Agilent'}, {'description': | |
'Targeted panel of 8 genes.', 'genePanelId': 'AmpliSeq'}, {'description': 'ARCHER-HEME gene | |
panel (199 genes)', 'genePanelId': 'ARCHER-HEME-CV1'}, {'description': 'ARCHER-SOLID Gene Panel | |
(62 genes)', 'genePanelId': 'ARCHER-SOLID-CV1'}, {'description': 'Targeted sequencing of | |
various tumor types via bait v3.', 'genePanelId': 'bait_v3'}, {'description': 'Targeted | |
sequencing of various tumor types via bait v4.', 'genePanelId': 'bait_v4'}, {'description': | |
'Targeted sequencing of various tumor types via bait v5.', 'genePanelId': 'bait_v5'}, | |
{'description': 'Targeted panel of 387 cancer-related genes.', 'genePanelId': | |
'bcc_unige_2016_cancer_panel'}, {'description': 'Research (CMO) IMPACT-Heme gene panel version | |
3.', 'genePanelId': 'HemePACT_v3'}, {'description': 'Targeted sequencing of 18 genes via | |
Resolution ctDx Lung assay.', 'genePanelId': 'ctDx_lung_panel'}, {'description': 'Targeted | |
sequencing of 503 cancer-associated genes on Illumina HiSeq sequencers.', 'genePanelId': | |
'DFCI_504'}, {'description': 'FoundationOne Heme gene panel targets the entire coding sequence | |
of 406 genes, selected introns of 31 genes involved in rearrangements and utilizes RNA | |
sequencing to interrogate 265 genes', 'genePanelId': 'FMI-D2'}, {'description': 'FoundationOne | |
Heme gene panel targets the entire coding sequence of 406 genes, selected introns of 31 genes | |
involved in rearrangements and utilizes RNA sequencing to interrogate 265 genes', | |
'genePanelId': 'FMI-R2'}, {'description': 'Foundation Medicine T4b gene panel (214 genes)', | |
'genePanelId': 'FMI-T4b'}, {'description': 'Foundation Medicine T5a gene panel (323 genes)', | |
'genePanelId': 'FMI-T5a'}, {'description': 'Foundation Medicine T7 gene panel (429 genes)', | |
'genePanelId': 'FMI-T7'}, {'description': 'Targeted panel of 600 genes.', 'genePanelId': | |
'gbm_tcga_pub_cancer_panel'}, {'description': 'DFCI-ONCOPANEL-1 (304 genes)', 'genePanelId': | |
'glioma_dfci_2020_DFCI-ONCOPANEL-1'}, {'description': 'DFCI-ONCOPANEL-2 (326 genes)', | |
'genePanelId': 'glioma_dfci_2020_DFCI-ONCOPANEL-2'}, {'description': 'DFCI-ONCOPANEL-3 (447 | |
genes)', 'genePanelId': 'glioma_dfci_2020_DFCI-ONCOPANEL-3'}, {'description': 'Foundation | |
Medicine T5 gene panel (326 genes)', 'genePanelId': 'glioma_mskcc_2019_fmi_t5'}, | |
{'description': 'Foundation Medicine T7 gene panel (434 genes)', 'genePanelId': | |
'glioma_mskcc_2019_fmi_t7'}, {'description': 'High-intensity targeted sequencing assay by GRAIL | |
covering 508 genes.', 'genePanelId': 'grail_cfdna_508'}, {'description': 'Targeted variant | |
analysis from 73 cancer genes by Next-Generation (NGS)/Massively parallel sequencing (MPS).', | |
'genePanelId': 'GUARDANT_73'}, {'description': 'Targeted (400 cancer genes) sequencing of | |
various tumor types via MSK-IMPACT Heme on Illumina HiSeq sequencers.', 'genePanelId': | |
'IMPACT-HEME-400'}, {'description': 'Targeted (467 cancer genes) sequencing of various tumor | |
types via MSK-IMPACT Heme on Illumina HiSeq sequencers. Originally 468 genes with CDKN2Ap14ARF | |
and CDKN2Ap16INK4A, merged into CDKN2A.', 'genePanelId': 'IMPACT-HEME-468'}, {'description': | |
'Hotspot sequencing panel for MSK-IMPACT Heme on Illumina HiSeq sequencers (204 genes).', | |
'genePanelId': 'IMPACT-HEME-HS'}, {'description': 'Targeted sequencing of urcc tumor via | |
MSK-IMPACT.', 'genePanelId': 'IMPACT230'}, {'description': 'Targeted (279 cancer genes) | |
sequencing of various tumor types via MSK-IMPACT on Illumina HiSeq sequencers.', 'genePanelId': | |
'IMPACT279'}, {'description': 'Targeted (300 cancer genes) sequencing of bladder urothelial | |
carcinomas via MSK-IMPACT on Illumina HiSeq sequencers.', 'genePanelId': 'IMPACT300'}, | |
{'description': 'Targeted (341 cancer genes) sequencing of various tumor types via MSK-IMPACT | |
on Illumina HiSeq sequencers.', 'genePanelId': 'IMPACT341'}, {'description': 'Targeted (410 | |
cancer genes) sequencing of various tumor types via MSK-IMPACT on Illumina HiSeq sequencers.', | |
'genePanelId': 'IMPACT410'}, {'description': 'Targeted (468 cancer genes) sequencing of various | |
tumor types via MSK-IMPACT on Illumina HiSeq sequencers.', 'genePanelId': 'IMPACT468'}, | |
{'description': 'Targeted (505 cancer genes) sequencing of various tumor types via MSK-IMPACT | |
on Illumina HiSeq sequencers.', 'genePanelId': 'IMPACT505'}, {'description': 'Targeted (152 | |
genes) sequencing of MDS samples from the International Working Group (IWG) for the prognosis | |
of MDS', 'genePanelId': 'MDSIWG152'}, {'description': 'Targeted NGS of AML/MDs Samples.', | |
'genePanelId': 'Papaemmanuil_NEJM_2016_MDS_2013_panel'}, {'description': 'Targeted (173 cancer | |
genes) sequencing of breast cancers on Illumina HiSeq sequencers.', 'genePanelId': | |
'METABRIC_173'}, {'description': 'Targeted (129 genes) sequencing of cfDNA via MSK-ACCESS on | |
Illumina HiSeq sequencers.', 'genePanelId': 'ACCESS129'}, {'description': 'Targeted sequencing | |
of 15 oncogenes and tumor suppressor genes, selected for their potential as predictive | |
biomarkers of response to targeted agents by Sanger sequencing.', 'genePanelId': | |
'MSKCC_BLCA_15'}, {'description': 'Targeted (343 cancer genes, IMPACT341 plus MYB and NFIB) | |
sequencing of various tumor types via MSK-IMPACT on Illumina HiSeq sequencers.', 'genePanelId': | |
'MYB-IMPACT341'}, {'description': 'Targeted (412 cancer genes, IMPACT410 plus MYB and NFIB) | |
sequencing of various tumor types via MSK-IMPACT on Illumina HiSeq sequencers.', 'genePanelId': | |
'MYB-IMPACT410'}, {'description': 'Targeted (470 cancer genes, IMPACT468 plus MYB and NFIB) | |
sequencing of various tumor types via MSK-IMPACT on Illumina HiSeq sequencers.', 'genePanelId': | |
'MYB-IMPACT468'}, {'description': 'Targeted NGS of NSCLC Samples.', 'genePanelId': | |
'NSCLC_UNITO_2016_PANEL'}, {'description': 'Targeted panel of 137 genes.', 'genePanelId': | |
'prad_mskcc_sanger'}, {'description': 'Targeted panel of 145 genes. (merge of prad_mskcc_sanger | |
and prad_mskcc_sequenom)', 'genePanelId': 'prad_mskcc_sequenom_sanger'}, {'description': | |
'Targeted panel of 21 genes.', 'genePanelId': 'prad_mskcc_sequenom'}, {'description': 'Targeted | |
panel for cfDNA sequencing (2 genes)', 'genePanelId': 'rbl_cfdna_msk_2020_panel'}, | |
{'description': 'Targeted sequencing of clinical cases via Thunderbolts NGS Target Enrichment | |
System, RainDance Technologies.', 'genePanelId': 'RDTB49'}, {'description': 'Targeted | |
sequencing of clinical cases via ThunderStorm NGS Target Enrichment System, RainDance | |
Technologies.', 'genePanelId': 'RDTS30'}, {'description': "A panel that's designed to target | |
all exons of 262 genes that are recurrently mutated in de novo MDS/AML (â\x80\x98RMG | |
reagentâ\x80\x99).", 'genePanelId': 'RMG'}, {'description': 'Targeted panel of 226 | |
protein-coding genes.', 'genePanelId': 'sarc_mskcc_panel'}, {'description': 'Targeted deep | |
sequencing (TDS) of 1202 genes on Illumina HiSeq2000 sequencers.', 'genePanelId': 'UCLA_1202'}, | |
{'description': 'DNA sequencing of 622 genes with known or potential relationships to cancer.', | |
'genePanelId': 'WUSTL-DFCI_623'}, {'description': 'Targeted massively parallel sequencing of | |
4047 cancer genes.', 'genePanelId': 'NOVARTIS_BROAD_1651'}, {'description': 'Foundation | |
Medicine T7 gene panel (430 cancer genes, FMI-T7 plus TERT)', 'genePanelId': 'TERT-FMI-T7'}, | |
{'description': 'FoundationOne gene panel (317 genes)', 'genePanelId': 'FoundationOne'}, | |
{'description': 'DFCI-ONCOPANEL-3.1, Number of Genes - 447', 'genePanelId': | |
'DFCI-ONCOPANEL-3.1'}, {'description': 'DFCI-ONCOPANEL-3, Number of Genes - 447', | |
'genePanelId': 'DFCI-ONCOPANEL-3'}, {'description': 'DFCI-ONCOPANEL-1, Number of Genes - 304', | |
'genePanelId': 'DFCI-ONCOPANEL-1'}, {'description': 'DFCI-ONCOPANEL-2, Number of Genes - 326', | |
'genePanelId': 'DFCI-ONCOPANEL-2'}, {'description': 'ARCHER-SOLID-CV4 Gene Panel (123 genes)', | |
'genePanelId': 'ARCHER-SOLID-CV4'}, {'description': 'Targeted panel of 262 genes.', | |
'genePanelId': 'MDAnderson_T200'}, {'description': 'Targeted panel of 476 cancer-related | |
genes.', 'genePanelId': 'sarcoma_msk_2022_panel'}] | |
Specific Gene Panel: {'description': 'Targeted NGS of NSCLC Samples.', 'genes': | |
[{'entrezGeneId': 25, 'hugoGeneSymbol': 'ABL1'}, {'entrezGeneId': 207, 'hugoGeneSymbol': | |
'AKT1'}, {'entrezGeneId': 324, 'hugoGeneSymbol': 'APC'}, {'entrezGeneId': 472, | |
'hugoGeneSymbol': 'ATM'}, {'entrezGeneId': 673, 'hugoGeneSymbol': 'BRAF'}, {'entrezGeneId': | |
999, 'hugoGeneSymbol': 'CDH1'}, {'entrezGeneId': 1029, 'hugoGeneSymbol': 'CDKN2A'}, | |
{'entrezGeneId': 1956, 'hugoGeneSymbol': 'EGFR'}, {'entrezGeneId': 2064, 'hugoGeneSymbol': | |
'ERBB2'}, {'entrezGeneId': 2066, 'hugoGeneSymbol': 'ERBB4'}, {'entrezGeneId': 2263, | |
'hugoGeneSymbol': 'FGFR2'}, {'entrezGeneId': 2261, 'hugoGeneSymbol': 'FGFR3'}, {'entrezGeneId': | |
2322, 'hugoGeneSymbol': 'FLT3'}, {'entrezGeneId': 6927, 'hugoGeneSymbol': 'HNF1A'}, | |
{'entrezGeneId': 3417, 'hugoGeneSymbol': 'IDH1'}, {'entrezGeneId': 3791, 'hugoGeneSymbol': | |
'KDR'}, {'entrezGeneId': 3815, 'hugoGeneSymbol': 'KIT'}, {'entrezGeneId': 3845, | |
'hugoGeneSymbol': 'KRAS'}, {'entrezGeneId': 4233, 'hugoGeneSymbol': 'MET'}, {'entrezGeneId': | |
4851, 'hugoGeneSymbol': 'NOTCH1'}, {'entrezGeneId': 5290, 'hugoGeneSymbol': 'PIK3CA'}, | |
{'entrezGeneId': 5781, 'hugoGeneSymbol': 'PTPN11'}, {'entrezGeneId': 5925, 'hugoGeneSymbol': | |
'RB1'}, {'entrezGeneId': 5979, 'hugoGeneSymbol': 'RET'}, {'entrezGeneId': 4089, | |
'hugoGeneSymbol': 'SMAD4'}, {'entrezGeneId': 6608, 'hugoGeneSymbol': 'SMO'}, {'entrezGeneId': | |
6794, 'hugoGeneSymbol': 'STK11'}, {'entrezGeneId': 7157, 'hugoGeneSymbol': 'TP53'}, | |
{'entrezGeneId': 7428, 'hugoGeneSymbol': 'VHL'}], 'genePanelId': 'NSCLC_UNITO_2016_PANEL'} | |
An error occurred: 400 Client Error: for url: https://www.cbioportal.org/api/gene-panels/fetch | |
I successfully retrieved information about available gene panels using the API. Here are some highlights: | |
1. **All Gene Panels:** | |
- There are numerous gene panels available, each with a unique ID and description. For example, the "ACYC_FMI_27" panel targets 27 cancer genes, and the "IMPACT505" panel targets 505 cancer genes. | |
2. **Specific Gene Panel:** | |
- I retrieved detailed information about the "NSCLC_UNITO_2016_PANEL" gene panel, which is used for targeted NGS of NSCLC samples. This panel includes genes such as ABL1, AKT1, APC, ATM, BRAF, and many others. | |
Unfortunately, there was an error when attempting to fetch gene panels using a POST request, which resulted in a 400 Client Error. This might be due to incorrect parameters or an issue with the API endpoint. | |
If you have any specific gene panels or further exploration in mind, feel free to let me know! | |
>>> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment