Last active
November 5, 2024 17:13
-
-
Save ianmcook/1c4623c995202f98b289395cf63159f2 to your computer and use it in GitHub Desktop.
Use the Snowflake SQL REST API from a shell script with curl and jq to execute a query and download the result partitions in Arrow format
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"statement": "SELECT * FROM MYTABLE", | |
"resultSetMetaData": { | |
"format": "arrowv1" | |
}, | |
"timeout": 60, | |
"database": "MYDATABASE", | |
"schema": "MYSCHEMA", | |
"warehouse": "MYWAREHOUSE", | |
"role": "MYROLE" | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
account_ident="ACCOUNT-IDENT" | |
user="[email protected]" | |
private_key_path="/path/to/rsa_key.p8" | |
bearer_token=$(echo "" | snowsql --private-key-path $private_key_path --generate-jwt -a "$account_ident" -u "$user") | |
subdomain=$(echo "$account_ident" | tr '[:upper:]' '[:lower:]') | |
curl -o result-metadata.json -X POST \ | |
-H "Content-Type: application/json" \ | |
-H "Accept: application/json" \ | |
-H "Authorization: Bearer $bearer_token" \ | |
-H "User-Agent: customApplication/1.0" \ | |
-H "X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT" \ | |
-d "@request-body.json" \ | |
"https://$subdomain.snowflakecomputing.com/api/v2/statements" | |
statement_handle=$(jq -r '.statementHandle' result-metadata.json) | |
partitions=$(jq '.resultSetMetaData.partitionInfo | length - 1' result-metadata.json) | |
curl --parallel --compressed $(seq -f "-o partition-%g.arrows" 1 $partitions) \ | |
-H "Accept: application/vnd.apache.arrow.stream" \ | |
-H "Authorization: Bearer $bearer_token" \ | |
-H "User-Agent: customApplication/1.0" \ | |
-H "X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT" \ | |
$(seq -f "https://$subdomain.snowflakecomputing.com/api/v2/statements/$statement_handle?partition=%g" 1 $partitions) |
Alternatively: If you plan to persist the downloaded data files and you want to minimize the associated storage space or cost, do not decompress the files on download. To do this:
- Remove the
--compressed
flag from the curl command. - Change the curl output flag to
-o partition-%g.arrows.gz
.
Then you can decompress the files when you read them into Python:
import pyarrow as pa
import glob
import gzip
def yield_batches(paths):
for path in paths:
try:
with gzip.open(path) as file:
reader = pa.ipc.open_stream(file)
for batch in reader:
yield batch
except (gzip.BadGzipFile):
with open(path, "rb") as file:
reader = pa.ipc.open_stream(file)
for batch in reader:
yield batch
paths = glob.glob("*.arrows.gz")
table = pa.Table.from_batches(yield_batches(paths))
Also see the example code at apache/arrow#44561 (comment) which shows how to create a record batch reader (which enables lazy/deferred reading) instead of eagerly reading the files into a table.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Python code to read the resulting files into a PyArrow Table: