Skip to content

Instantly share code, notes, and snippets.

@nocollier
Created January 20, 2024 02:11
Show Gist options
  • Save nocollier/dd172461d351ed133b2a422cba3232d2 to your computer and use it in GitHub Desktop.
Save nocollier/dd172461d351ed133b2a422cba3232d2 to your computer and use it in GitHub Desktop.
intake-esgf globus integration prototype
import globus_sdk as sdk
# I created a client_uuid for intake-esgf
CLIENT_ID = "81a13009-8326-456e-a487-2d1557d8eb11"
client = sdk.NativeAppAuthClient(CLIENT_ID)
# Authorize our interactions with Globus.
client.oauth2_start_flow()
authorize_url = client.oauth2_get_authorize_url()
print(
f"""
All interactions with Globus must be authorized. To ensure that we have permission to
faciliate your transfer, please open the following link in your browser.
{authorize_url}
You will have to login (or be logged in) to your Globus account. Globus will also
request that you give a label for this authorization. You may pick anything of your
choosing. After following the instructions in your browser, Globus will generate a code
which you must copy and paste here and then hit <enter>.\n"""
)
auth_code = input("> ").strip()
# Using this code, we get tokens that authorize a transfer (and other requests).
token_response = client.oauth2_exchange_code_for_tokens(auth_code)
globus_transfer_data = token_response.by_resource_server["transfer.api.globus.org"]
transfer_client = sdk.TransferClient(
authorizer=sdk.AccessTokenAuthorizer(globus_transfer_data["access_token"])
)
# These are the links we are getting out of the index, but as far as I understand, they
# are not really needed. They take the form 'globus://{COLLECTION_UUID}{PATH}'. We could
# simply map data_node to a collection uuid and then build the path from other data in
# the file response.
source_uuid = "8896f38e-68d1-4708-bce4-b1b3a3405809"
index_links = [
"globus://8896f38e-68d1-4708-bce4-b1b3a3405809/css03_data/CMIP6/CMIP/CCCma/CanESM5/historical/r1i1p1f1/Amon/pr/gn/v20190429/pr_Amon_CanESM5_historical_r1i1p1f1_gn_185001-201412.nc",
"globus://8896f38e-68d1-4708-bce4-b1b3a3405809/css03_data/CMIP6/CMIP/CCCma/CanESM5/historical/r1i1p1f1/Amon/tas/gn/v20190429/tas_Amon_CanESM5_historical_r1i1p1f1_gn_185001-201412.nc",
"globus://8896f38e-68d1-4708-bce4-b1b3a3405809/css03_data/CMIP6/CMIP/CCCma/CanESM5/historical/r1i1p1f1/Lmon/nbp/gn/v20190429/nbp_Lmon_CanESM5_historical_r1i1p1f1_gn_185001-201412.nc",
]
# This is my globus connect personal collection. I found this number by browsing my
# collections and opened the properties.
destination_uuid = "285fafe4-ae63-11ee-b085-4bb870e392e2"
# We build up some task data by giving both endpoints.
task_data = sdk.TransferData(
source_endpoint=source_uuid, destination_endpoint=destination_uuid
)
# And then looping over the links and pulling out the path.
for link in index_links:
link = link.replace("globus://", "")
path = link[link.index("/") :]
task_data.add_item(path, f"~/tmp{path}")
# We now submit the task_data to the transfer_client
task_doc = transfer_client.submit_transfer(task_data)
print(f"Transfer submitted, task_id = {task_doc["task_id"]}")
"""Questions:
1. I see that I can request 'refresh tokens'. Could these last longer than 10 minutes?
2. Would I then write out the refresh token to, say, the local cache and then try to use
it by default? Then if it fails I just have them re-authenticate?
3. How will the users discover possible destination endpoints? Do they just have to use
the browser to find one and then get the uuid and enter it in the terminal?
4. Is there a way I can query if a task_id is finished? I could have my code block until
the transer is complete and then load the paths into the xarray containers.
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment