Automation with AWS Lambda

Setup Python-lambda

Create new directory and call it whatever you want.
Enter new directory and run virtualvenv venv from Terminal. If you don't have virtualenv, you can install it with pip install virtualenv.
Activate virtualenv with source venv/bin/activate
Run (venv) $ pip install python-lambda
Run lambda init
In config.yaml, update function_name and description. Do not add AWS stuff here because it will get pushed up to github.
Create a .gitignore file if you don't have one and add .env to it.
Add your credentials to an .env file like:

export AWS_ACCESS_KEY_ID=''
export AWS_SECRET_ACCESS_KEY=''

Run (venv) $ source .env to activate.
Create/note which AWS bucket you want to save into. Enter it into bucket_data_path in service.py.
In service.py replace 'your-file-name.json' with the name of your file.

Service.py

The handler function is invoked in response to an event, so this is what you'll populate.

The populated example will show you that you can grab variables from the events.json file if you so choose. I found I didn't use that in my code. This is just for local testing.

Example service.py setup:

import os
import json
import boto3

# Since scrapers can run long, save them as modules and import them
from exampleFile1 import exampleFunction1

def handler(*args):
  # your code here
  
  # trash this example
  data = exampleFunction1(1)
  # Should print '2'
  print(data) 
  
  # upload some data
  upload_data_s3(data)

def upload_data_s3(data):
  s3 = boto3.resource('s3')
  bucket = s3.Bucket('interactives.dallasnews.com')
  bucket_data_path = '2017/some-path'

  bucket.put_object(
    Key=os.path.join(
      bucket_data_path,
      'your-file-name.json'
    ),
    Body=json.dumps(data),
    ACL='public-read',
    ContentType='application/json'
  )

# this function is just for our testing purposes,
# just calling the main handler function
if __name__ == '__main__':
  handler()

You'll be able to access this file in a JS script with something like:

d3.json("https://interactives.dallasnews.com/2017/some-path/your-file-name.json", function(error, data){
  // your code here
})

Testing and deploying

Test

If you are using a populated event.json file, you can call:

(venv) $ lambda invoke -v

and it will run the handler(event, context) function.

If you aren't using the event.json (like I'm not in the example above), simply call:

(venv) $ python service.py

Deploy

When you're ready to deploy, run:

(venv) $ lambda deploy

Setting up events on AWS Lambda

Navigate to lambda and then:

Configure any necessary environment variables. (API keys, etc)
Triggers > add trigger. I've selected CloudWatch Event because I'm going to ping a page myself and check for updates.
Create a new rule. If you want it to fire at a certain time or at certain intervals, selection "Schedule expression"
Use a fancy chron expression and submit your trigger.
Under Configuration tab in Advanced Settings, you can set a timeout if you so choose.
Hitting "Test" button will run the scraper and populate the S3 bucket.

LayneSmith/README.md