Skip to content

Instantly share code, notes, and snippets.

@cbare
Created June 3, 2016 21:49
Show Gist options
  • Save cbare/ecd0507d64c90c782c49dce0e1f44984 to your computer and use it in GitHub Desktop.
Save cbare/ecd0507d64c90c782c49dce0e1f44984 to your computer and use it in GitHub Desktop.
An example of accessing Synapse from AWS Lambda
"""
=======================================================
How to access Synapse from Amazon Lambda with Python
=======================================================
Here we show an example of adding a row to a Synapse table
through an AWS Lambda script.
Caveat: Any operation that requires chunked file upload
fails on AWS Lambda. The execution environment for
Lambda scripts seems not to allow access to the OS
resources required by multiprocessing.dummy, which
the Synapse Python client uses to parallelize
chunked upload.
see: https://forums.aws.amazon.com/thread.jspa?threadID=232868
-------
Notes
-------
* In the AWS console, create a Lambda function whose "Handler"
has the form: spam_synapse.lambda_handler where spam_synapse.py
is the filename and lambda_handler(event, context) is a
function in that file.
* Create a Scheduled Event
* Create a key (in the right region) using AWS KMS
* Use that key to encrypt a Synapse API key
* Set the Synapse cache to live inside the /tmp dir
* Package script with dependencies and upload
---------------------------------------
Packaging the script and dependencies
---------------------------------------
Note that we have to package dependencies (except for
boto) along with our app. For complete instructions,
see the AWS Lambda docs for the topic "Creating a
Deployment Package (Python)", here:
http://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html
I created an empty directory called spam_synapse and
installed my script and its dependencies there. Lambda
wants a zip file with the contents of the directory,
not the directory itself:
$ pip2 install synapseclient -t /path/to/spam_synapse
$ pip2 install setuptools -t /path/to/spam_synapse
$ pip2 install backports.csv -t /path/to/spam_synapse
$ pip2 install future -t /path/to/spam_synapse
$ cd /path/to/spam_synapse
$ zip -r ../spam_synapse.zip *
"""
import base64
import boto3
from datetime import datetime
import synapseclient
from synapseclient import Table, RowSet, Row
import synapseclient.utils as utils
## The file systems your Lambda script has access to is
## read-only except for the /tmp directory, so we'll
## have to put the Synapse cache there.
synapseclient.cache.CACHE_ROOT_DIR = "/tmp/synapseCache"
## Synapse ID for the table we'll be writing to
TABLEID = "syn6120245"
## The recommended way to include a secret in a Lambda script
## is to encrypt that secret with the AWS CLI, like so:
## aws kms encrypt --key-id some_key_id \
## --plaintext "This is the secret you want to encrypt" \
## --query CiphertextBlob --output text
## So, we've included a Synapse API key encrypted with an AWS:
ENCRYPTED_APIKEY = "CiABEQY/uH5qFAKESECRETYACANTBETOOPARANOIDov/KH+emMCM3"
print('Loading spam synapse function')
def format_datetime(dt):
"""
Format dates in the way that Synapse tables prefers
"""
fmt = "{time.year:04}-{time.month:02}-{time.day:02} {time.hour:02}:{time.minute:02}:{time.second:02}.{millisecond:03}"
if dt.microsecond >= 999500:
dt -= timedelta(microseconds=dt.microsecond)
dt += timedelta(seconds=1)
return fmt.format(time=dt, millisecond=int(round(dt.microsecond/1000.0)))
def lambda_handler(event, context):
"""
This is the function that Lambda calls.
"""
print(event)
## decrypt Synapse API key via AWS Key Management Service
kms = boto3.client('kms')
decryption = kms.decrypt(CiphertextBlob=base64.decodestring(ENCRYPTED_APIKEY))
apikey = decryption['Plaintext']
## login using cached credentials
syn = synapseclient.Synapse()
syn.login('your_synapse_user_name_here', apiKey=apikey)
## Add a row to a table using the RowSet method. Using RowSet
## is important because it sends the row data to Synapse by
## encoded as JSON via the REST API and doesn't use chunked
## upload, which doesn't work on Lambda.
schema = syn.get(TABLEID)
cols = syn.getColumns(schema)
new_rows = RowSet(columns=cols, schema=schema,
rows=[Row([datetime.now().strftime("%Y-%m-%d"),
format_datetime( datetime.now() ),
"AWS lambda",
"fubar"])])
return syn.store(new_rows)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment