Skip to content

Instantly share code, notes, and snippets.

@inodb
Last active July 18, 2019 00:46
Show Gist options
  • Save inodb/96566499eae5b9daa6c1d61d2c1d0e38 to your computer and use it in GitHub Desktop.
Save inodb/96566499eae5b9daa6c1d61d2c1d0e38 to your computer and use it in GitHub Desktop.
"""
Check if studies on datahub are imported in live public portal
Currently comparing study folder name on datahub vs live public portal studies
endpoint. Should prolly look at actual `cancer_study_identifier: study_name` in
meta_study.txt file, but this seems to work.
"""
import requests
public_study_folders = requests.get("https://api.github.com/repos/cBioPortal/datahub/contents/public")
datahub_study_names = set(map(lambda x: x["name"], public_study_folders.json()))
live_studies = requests.get("https://www.cbioportal.org/api/studies")
live_study_names = set(map(lambda x: x["studyId"], live_studies.json()))
print("Number of live studies: {}".format(len(live_study_names)))
print("Number of studies on datahub: {}".format(len(datahub_study_names)))
print("Studies on datahub, but not in live portal: {}".format(datahub_study_names - live_study_names))
print("Studies on live portal, but not in datahub: {}".format(live_study_names - datahub_study_names))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment