Skip to content

Instantly share code, notes, and snippets.

@arianagiorgi
Last active June 27, 2017 19:43
Show Gist options
  • Save arianagiorgi/aa1c091a2e717c701884c19151546ada to your computer and use it in GitHub Desktop.
Save arianagiorgi/aa1c091a2e717c701884c19151546ada to your computer and use it in GitHub Desktop.
AWS Lambda Automation

Automation with AWS Lambda

Setup Python-lambda

We'll be using Python-lambda

  1. Create new directory, virtualenv. If you're attaching this scraper to an interactive, you can just add a new folder to the main directory.

  2. (venv) $ pip install python-lambda

  3. lambda init

  4. In config.yaml, update function_name and description. Do not add AWS stuff here because it will get pushed up to github.

  5. Instead, add your credentials to an .env file like:

AWS_ACCESS_KEY_ID=''
AWS_SECRET_ACCESS_KEY=''

and run (venv) $ source .env to activate.

Service.py

The handler function is invoked in response to an event, so this is what you'll populate.

The populated example will show you that you can grab variables from the events.json file if you so choose. I found I didn't use that in my code. This is just for local testing.

Example service.py setup:

import os
import json

import boto3

def handler(*args):
  # your code here

  # upload some data
  upload_data_s3(data)

def upload_data_s3(data):
  s3 = boto3.resource('s3')
  bucket = s3.Bucket('interactives.dallasnews.com')
  bucket_data_path = '2017/some-path'

  bucket.put_object(
    Key=os.path.join(
      bucket_data_path,
      'your-file-name.json'
    ),
    Body=json.dumps(data),
    ACL='public-read',
    ContentType='application/json'
  )

# this function is just for our testing purposes,
# just calling the main handler function
if __name__ == '__main__':
  handler()

You'll be able to access this file in a JS script with something like:

d3.json("https://interactives.dallasnews.com/2017/some-path/your-file-name.json", function(error, data){
  // your code here
})

Testing and deploying

Test

If you are using a populated event.json file, you can call:

(venv) $ lambda invoke -v

and it will run the handler(event, context) function.

If you aren't using the event.json (like I'm not in the example above), simply call:

(venv) $ python service.py

Deploy

When you're ready to deploy, run:

(venv) $ lambda deploy

Setting up events on AWS Lambda

Navigate to lambda and then:

  1. Configure any necessary environment variables. (API keys, etc)

  2. Triggers > add trigger. I've selected CloudWatch Event because I'm going to ping a page myself and check for updates.

  3. Create a new rule. If you want it to fire at a certain time or at certain intervals, selection "Schedule expression"

  4. Use a fancy chron expression and submit your trigger.

  5. Under Configuration tab in Advanced Settings, you can set a timeout if you so choose.

  6. Hitting "Test" button will run the scraper and populate the S3 bucket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment